如何以“更智能”的方式使用python下载文件？

小编典典

python

我需要在Python中通过http下载多个文件。

最明显的方法就是使用urllib2：

import urllib2
u = urllib2.urlopen('http://server.com/file.html')
localFile = open('file.html', 'w')
localFile.write(u.read())
localFile.close()

但我不得不面对以某种方式是讨厌的网址，这样说：http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf。通过浏览器下载时，文件具有人类可读的名称，即。accounts.pdf。

有什么办法可以在python中处理它，所以我不需要知道文件名并将其硬编码到脚本中？

阅读 213

2020-12-20

共1个答案

小编典典

像这样的下载脚本往往会推送一个标题，告诉用户代理该文件的名称：

Content-Disposition: attachment; filename="the filename.ext"

如果可以获取该标头，则可以获取正确的文件名。

还有另一个线程可以提供一些代码来进行Content-Disposition抓取。

remotefile = urllib2.urlopen('http://example.com/somefile.zip')
remotefile.info()['Content-Disposition']

2020-12-20