我有一段这样的代码
host = 'http://www.bing.com/search?q=%s&go=&qs=n&sk=&sc=8-13&first=%s' % (query, page) req = urllib2.Request(host) req.add_header('User-Agent', User_Agent) response = urllib2.urlopen(req)
当我输入的查询超过一个词,例如“狗”时,出现以下错误。
response = urllib2.urlopen(req) File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 400, in open response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", line 513, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.7/urllib2.py", line 438, in error return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 521, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 400: Bad Request
谁能指出我做错了什么?提前致谢。
“狗”返回400错误的原因是因为您没有转义URL的字符串。
如果您这样做:
import urllib, urllib2 quoted_query = urllib.quote(query) host = 'http://www.bing.com/search?q=%s&go=&qs=n&sk=&sc=8-13&first=%s' % (quoted_query, page) req = urllib2.Request(host) req.add_header('User-Agent', User_Agent) response = urllib2.urlopen(req)
会的。
但是,我强烈建议您使用请求,而不要使用urllib / urllib2 / httplib。这要容易得多,它将为您处理所有这一切。
这是与python请求相同的代码:
import requests results = requests.get("http://www.bing.com/search", params={'q': query, 'first': page}, headers={'User-Agent': user_agent})