我正在寻找将句子拆分成单词的pythonic方法,并且还将所有单词的索引信息存储在句子中,例如
a = "This is a sentence" b = a.split() # ["This", "is", "a", "sentence"]
现在,我还想存储所有单词的索引信息
c = a.splitWithIndices() #[(0,3), (5,6), (8,8), (10,17)]
实现splitWithIndices()的最佳方法是什么,python是否有我可以使用的任何库方法。任何可以帮助我计算单词索引的方法都很好。
我认为返回相应接头的开始和结束更为自然。例如(0,4)而不是(0,3)
>>> from itertools import groupby >>> def splitWithIndices(s, c=' '): ... p = 0 ... for k, g in groupby(s, lambda x:x==c): ... q = p + sum(1 for i in g) ... if not k: ... yield p, q # or p, q-1 if you are really sure you want that ... p = q ... >>> a = "This is a sentence" >>> list(splitWithIndices(a)) [(0, 4), (5, 7), (8, 9), (10, 18)] >>> a[0:4] 'This' >>> a[5:7] 'is' >>> a[8:9] 'a' >>> a[10:18] 'sentence'