Porter Stemmer算法不返回预期的输出？修改为def时

小编典典

Porter Stemmer算法不返回预期的输出？修改为def时

algorithm

我正在使用Python
Port PorterStemmer

Porter词干算法（或“ Porter
stemmer”）是用于从英语单词中删除较常见的词法和不固定词尾的过程。它的主要用途是术语标准化过程的一部分，该过程通常在设置信息检索系统时完成。

对于以下

您需要做的另一件事是将每个单词缩小到其词干。例如，词sing，sings，singing
都具有相同的干，这是sing。有一种合理接受的方法可以做到这一点，称为波特算法。您可以从 http://tartarus.org/martin/PorterStemmer/下载可执行此操作的内容。

而且我已经修改了代码。

if __name__ == '__main__':
    p = PorterStemmer()
    if len(sys.argv) > 1:
        for f in sys.argv[1:]:
            infile = open(f, 'r')
            while 1:
                output = ''
                word = ''
                line = infile.readline()
                if line == '':
                    break
                for c in line:
                    if c.isalpha():
                        word += c.lower()
                    else:
                        if word:
                            output += p.stem(word, 0,len(word)-1)
                            word = ''
                        output += c.lower()
                print output,
            infile.close()

input从预处理的字符串中读取文件而不是文件，并返回输出。

def algorithm(input):
    p = PorterStemmer()
    while 1:
        output = ''
        word = ''
        if input == '':
            break
        for c in input:
            if c.isalpha():
                word += c.lower()
            else:
                if word:
                    output += p.stem(word, 0,len(word)-1)
                    word = ''
                output += c.lower()
        return output

请注意，如果我将其return output放在缩进的位置上，则缩进while 1:它的位置infinite loop。

用法（示例）

import PorterStemmer as ps
ps.algorithm("Michael is Singing");

输出量

迈克尔是

预期产量

迈克尔在唱歌

我究竟做错了什么？

阅读 270

2020-07-28

共1个答案

小编典典

因此，罪魁祸首是它当前没有将输入的最后部分写到output（例如，尝试“ Michael is Singing
stuff”-它应该正确地写所有内容并省略“
stuff”）。可能有一种更优雅的方式来处理此问题，但是您可以尝试的一件事是else在for循环中添加一个子句。由于问题在于最终单词未包含在中output，因此我们可以else用来确保在for循环完成时添加最终单词：

def algorithm(input):
    print input
    p = PorterStemmer()
    while 1:
        output = ''
        word = ''
        if input == '':
            break
        for c in input:
            if c.isalpha():
                word += c.lower()
            elif word:
                output += p.stem(word, 0,len(word)-1)
                word = ''
                output += c.lower()
        else:
            output += p.stem(word, 0, len(word)-1)  
        print output
        return output

这已经通过两个测试用例进行了广泛的测试，因此很明显它是防弹的：）可能有一些边缘案例在那附近爬行，但是希望它可以帮助您入门。

2020-07-28