Python - Chunks和Chinks Python - 标记单词 Python - 块分类 分块是根据单词的性质将相似单词分组在一起的过程。在下面的示例中,我们定义了必须生成块的语法。语法表示在创建块时将遵循的诸如名词和形容词等短语的序列。块的图形输出如下所示。 import nltk sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")] grammar = "NP: {?*}" cp = nltk.RegexpParser(grammar) result = cp.parse(sentence) print(result) result.draw() 当我们运行上面的程序时,我们得到以下输出 - 改变语法,我们得到一个不同的输出,如下所示。 import nltk sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")] grammar = "NP: { ?*}" chunkprofile = nltk.RegexpParser(grammar) result = chunkprofile.parse(sentence) print(result) result.draw() 当我们运行上面的程序时,我们得到以下输出 - Chinking Chinking是从块中移除一系列令牌的过程。如果令牌序列出现在块的中间,则删除这些令牌,留下两个已经存在的块。 import nltk sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")] grammar = r""" NP: {<.*>+} # Chunk everything }+{ # Chink sequences of JJ and NN """ chunkprofile = nltk.RegexpParser(grammar) result = chunkprofile.parse(sentence) print(result) result.draw() 当我们运行上面的程序时,我们得到以下输出 - 如您所见,符合语法标准的部分从名词短语中省略为单独的块。提取不在所需块中的文本的过程称为chinking。 Python - 标记单词 Python - 块分类