如果我有这样的框架
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
我想检查这些行中是否包含某个单词,我只需要这样做。
frame['b'] = frame.a.str.contains("dog") | frame.a.str.contains("cat") | frame.a.str.contains("fish")
frame['b'] 输出:
frame['b']
True False True
如果我决定列出一个清单
mylist =['dog', 'cat', 'fish']
如何检查列表中的行是否包含某个单词?
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']}) frame a 0 the cat is blue 1 the sky is green 2 the dog is black
该str.contains方法接受正则表达式模式:
str.contains
mylist = ['dog', 'cat', 'fish'] pattern = '|'.join(mylist) pattern 'dog|cat|fish' frame.a.str.contains(pattern) 0 True 1 False 2 True Name: a, dtype: bool
由于支持正则表达式模式,因此您还可以嵌入标志:
frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']}) frame a 0 Cat Mr. Nibbles is blue 1 the sky is green 2 the dog is black pattern = '|'.join([f'(?i){animal}' for animal in mylist]) # python 3.6+ pattern '(?i)dog|(?i)cat|(?i)fish' frame.a.str.contains(pattern) 0 True # Because of the (?i) flag, 'Cat' is also matched to 'cat' 1 False 2 True