Python - 处理PDF Python - Bigrams Python - 处理Word文档 Python可以从中提取文本后读取PDF文件并打印出内容。为此,我们必须首先安装所需的模块 PyPDF2。以下是安装模块的命令。您应该已经在python环境中安装了pip。 pip install pypdf2 成功安装此模块后,我们可以使用模块中提供的方法读取PDF文件。 import PyPDF2 pdfName = 'path\codingdict.pdf' read_pdf = PyPDF2.PdfFileReader(pdfName) page = read_pdf.getPage(0) page_content = page.extractText() print page_content 当我们运行上面的程序时,我们得到以下输出 Tutorials Point originated from the idea that there exists a class of readers who respond better to online content and prefer to learn new skills at their own pace from the comforts of their drawing rooms. The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated, we worked our way to adding fresh tutorials to our repository which now proudly flaunts a wealth of tutorials and allied articles on topics ranging from programming languages to web designing to academics and much more. 阅读多个页面 要阅读包含多个页面的pdf并使用页码打印每个页面,我们使用带有getPageNumber()函数的循环。在下面的例子中我们有两个页面的PDF文件。内容在两个单独的页面标题下打印。 import PyPDF2 pdfName = 'Path\codingdict2.pdf' read_pdf = PyPDF2.PdfFileReader(pdfName) for i in xrange(read_pdf.getNumPages()): page = read_pdf.getPage(i) print 'Page No - ' + str(1+read_pdf.getPageNumber(page)) page_content = page.extractText() print page_content 当我们运行上面的程序时,我们得到以下输出 Page No - 1 Tutorials Point originated from the idea that there exists a class of readers who respond better to online content and prefer to learn new skills at their own pace from the comforts of their drawing rooms. Page No - 2 The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated, we worked our way to adding fresh tutorials to our repository which now proudly flaunts a wealth of tutorials and allied articles on topics ranging from p rogramming languages to web designing to academics and much more. Python - Bigrams Python - 处理Word文档