这个简单的 Python 3 脚本:
import urllib.request host = "scholar.google.com" link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0" url = "http://" + host + link filename = "cite0.bib" print(url) urllib.request.urlretrieve(url, filename)
引发此异常:
Traceback (most recent call last): File "C:\Users\ricardo\Desktop\Google-Scholar\BibTex\test2.py", line 8, in <module> urllib.request.urlretrieve(url, filename) File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve return _urlopener.retrieve(url, filename, reporthook, data) File "C:\Python32\lib\urllib\request.py", line 1597, in retrieve block = fp.read(bs) ValueError: read of closed file
我认为这可能是暂时的问题,因此我添加了一些简单的异常处理,例如:
import random import time import urllib.request host = "scholar.google.com" link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0" url = "http://" + host + link filename = "cite0.bib" print(url) while True: try: print("Downloading...") time.sleep(random.randint(0, 5)) urllib.request.urlretrieve(url, filename) break except ValueError: pass
但这只是Downloading...无限印刷。
Downloading...
您的URL返回403代码错误,显然urllib.request.urlretrieve不善于检测所有HTTP错误,因为它正在使用,urllib.request.FancyURLopener并且这是最新尝试通过返回aurlinfo而不是引发错误来吞噬错误。
urllib.request.FancyURLopener
urlinfo
关于此修复程序,如果您仍然想使用urlretrieve,则可以像这样覆盖FancyURLopener(随附的代码也可以显示错误):
import urllib.request from urllib.request import FancyURLopener class FixFancyURLOpener(FancyURLopener): def http_error_default(self, url, fp, errcode, errmsg, headers): if errcode == 403: raise ValueError("403") return super(FixFancyURLOpener, self).http_error_default( url, fp, errcode, errmsg, headers ) # Monkey Patch urllib.request.FancyURLopener = FixFancyURLOpener url = "http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0" urllib.request.urlretrieve(url, "cite0.bib")
否则,这是 我建议 您可以urllib.request.urlopen像这样使用的:
urllib.request.urlopen
fp = urllib.request.urlopen('http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0') with open("citi0.bib", "w") as fo: fo.write(fp.read())