更新答案:NLTK适用于2.7。我有3.2。我卸载了3.2,然后安装了2.7。现在可以了!!
我已经安装了NLTK并尝试下载NLTK数据。我所做的就是遵循此站点上的说明:http ://www.nltk.org/data.html
我下载了NLTK,进行了安装,然后尝试运行以下代码:
>>> import nltk >>> nltk.download()
它给了我如下错误信息:
Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> nltk.download() AttributeError: 'module' object has no attribute 'download' Directory of C:\Python32\Lib\site-packages
尝试了nltk.download()和nltk.downloader(),都给了我错误消息。
nltk.download()
nltk.downloader()
然后我习惯于help(nltk)拉出包装,它显示以下信息:
help(nltk)
NAME nltk PACKAGE CONTENTS align app (package) book ccg (package) chat (package) chunk (package) classify (package) cluster (package) collocations corpus (package) data decorators downloader draw (package) examples (package) featstruct grammar help inference (package) internals lazyimport metrics (package) misc (package) model (package) parse (package) probability sem (package) sourcedstring stem (package) tag (package) test (package) text tokenize (package) toolbox tree treetransforms util yamltags FILE c:\python32\lib\site-packages\nltk
我确实在那儿看到了Downloader,不确定为什么它不起作用。Python 3.2.2,系统Windows Vista。
要下载特定的数据集/模型,请使用nltk.download()函数,例如,如果你要下载punkt句子标记器,请使用:
punkt
$ python3 >>> import nltk >>> nltk.download('punkt')
如果不确定所需的数据/模型,则可以使用以下数据和模型的基本列表开始:
>>> import nltk >>> nltk.download('popular')
它将下载“流行”资源的列表,其中包括:
<collection id="popular" name="Popular packages"> <item ref="cmudict" /> <item ref="gazetteers" /> <item ref="genesis" /> <item ref="gutenberg" /> <item ref="inaugural" /> <item ref="movie_reviews" /> <item ref="names" /> <item ref="shakespeare" /> <item ref="stopwords" /> <item ref="treebank" /> <item ref="twitter_samples" /> <item ref="omw" /> <item ref="wordnet" /> <item ref="wordnet_ic" /> <item ref="words" /> <item ref="maxent_ne_chunker" /> <item ref="punkt" /> <item ref="snowball_data" /> <item ref="averaged_perceptron_tagger" /> </collection>
已编辑 如果有人避免nltk从https://stackoverflow.com/a/38135306/610569上从下载较大的数据集而避免错误
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip $ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite $ python >>> import nltk >>> dler = nltk.downloader.Downloader() >>> dler._update_index() >>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed. >>> dler.download('popular')
更新
从v3.2.5起,当nltk_data找不到资源时,NLTK会提供更多信息,例如:
nltk_data
>>> from nltk import word_tokenize >>> word_tokenize('x') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load opened_resource = _open(resource_url) File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open return find(path_, path + ['']).open() File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt') Searched in: - '/Users/alvas/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - ''