我有一个连接到Twitter Firehose并将其向下游发送数据进行处理的python脚本。在此之前,它可以正常工作,但是现在,我正在尝试仅获取文本正文。(这不是我应该如何从Twitter提取数据或如何编码/解码ascii字符的问题)。因此,当我像这样直接启动脚本时:
python -u fetch_script.py
它工作正常,我可以看到消息正在显示在屏幕上。例如:
root@domU-xx-xx-xx-xx:/usr/local/streaming# python -u fetch_script.py Cuz I'm checking you out >on Facebook< RT @SearchlightNV: #BarryLies has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2A… "Why do men chase after women? Because they fear death."~Moonstruck RT @SearchlightNV: #BarryLieshas crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2A… Never let anyone tell you not to chase your dreams. My sister came home crying today, because someone told her she's not good enough. "I can't even ask anyone out on a date because if it doesn't end up in a high speed chase, I get bored." RT @ColIegeStudent: Double-checking the attendance policy while still in bed Well I just handed my life savings to ya.. #trustingyou #abouttomakebankkkkk Zillow $Z and Redfin useless to Wells Fargo Home Mortgage, $WFC, and FannieMae $FNM. Sale history LTV now 48%, $360 appraisal fee 4 no PMI. The latest Dump and Chase Podcast http://t.co/viaRSA9W3i check it out and subscribe on iTunes, or your favorite android app #Isles
但是,如果我尝试将它们输出到文件中,如下所示:
python -u fetch_script.py >fetch_output.txt
愚蠢地抛出我和错误
root@domU-xx-xx-xx-xx:/usr/local/streaming# python -u fetch_script.py >fetch_output.txt ERROR:tornado.application:Uncaught exception, closing connection. Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper callback(*args) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped raise_exc_info(exc) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped ret = fn(*args, **kwargs) File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json self.parse_response(response) File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response self._callback(response) File "fetch_script.py", line 57, in callback print msg['text'] UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128) ERROR:tornado.application:Exception in callback <functools.partial object at 0x187c2b8> Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 458, in _run_callback callback() File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped raise_exc_info(exc) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped ret = fn(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper callback(*args) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped raise_exc_info(exc) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped ret = fn(*args, **kwargs) File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json self.parse_response(response) File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response self._callback(response) File "fetch_script.py", line 57, in callback print msg['text'] UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128)
callback函数发生错误:
callback
def callback(self, message): if message: msg = message msg_props = pika.BasicProperties() msg_props.content_type = 'application/text' msg_props.delivery_mode = 2 #print self.count print msg['text'] #self.count += 1 ...
但是, 如果我撤职 ['text']并且能够活下去,那么print msg这两种情况都像魅力一样。
['text']
print msg
既然还没有人跳进来,这就是我的镜头。在写入控制台时,Python会设置stdout的编码,但在写入文件时,Python不会设置。该脚本显示了问题
import sys msg = {'text':u'\2026'} sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding) print msg['text']
运行显示错误
$ python bad.py>/tmp/xxx default encoding: None Traceback (most recent call last): File "fix.py", line 5, in <module> print msg['text'] UnicodeEncodeError: 'ascii' codec can't encode character u'\x82' in position 0: ordinal not in range(128)
添加编码
import sys msg = {'text':u'\2026'} sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding) encoding = sys.stdout.encoding or 'utf-8' print msg['text'].encode(encoding)
问题解决了
$ python good.py >/tmp/xxx default encoding: None $ cat /tmp/xxx 6