我打算multiprocessing在我的代码中使用以获得更好的性能。
multiprocessing
但是,出现以下错误:
Traceback (most recent call last): File "D:\EpubBuilder\TinyEpub.py", line 49, in <module> e.epub2txt() File "D:\EpubBuilder\TinyEpub.py", line 43, in epub2txt tempread = self.get_text() File "D:\EpubBuilder\TinyEpub.py", line 29, in get_text txtlist = pool.map(self.char2text,charlist) File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get raise self._value File "C:\Python34\lib\multiprocessing\pool.py", line 383, in _handle_tasks put(task) File "C:\Python34\lib\multiprocessing\connection.py", line 206, in send self._send_bytes(ForkingPickler.dumps(obj)) File "C:\Python34\lib\multiprocessing\reduction.py", line 50, in dumps cls(buf, protocol).dump(obj) TypeError: cannot serialize '_io.BufferedReader' object
我尝试了另一种方式,并收到此错误:
TypeError: cannot serialize '_io.TextIOWrapper' object
我的代码如下所示:
from multiprocessing import Pool class Book(object): def __init__(self, arg): self.namelist = arg def format_char(self,char): char = char + "a" return char def format_book(self): self.tempread = "" charlist = [f.read() for f in self.namelist] #list of char with Pool() as pool: txtlist = pool.map(self.format_char,charlist) self.tempread = "".join(txtlist) return self.tempread if __name__ == '__main__': import os b = Book([open(f) for f in os.listdir()]) t = b.format_book() print(t)
我认为由于未Pool在main函数中使用而引起了错误。
Pool
我的猜想对吗?以及如何修改代码以修复错误?
问题是您在实例中有一个无法拾取的实例变量(namelist)Book。因为您正在调用pool.map实例方法,并且您正在Windows上运行,所以整个实例都必须是可腌制的,才能将其传递给子进程。Book.namelist是一个打开的文件对象(_io.BufferedReader),无法对其进行腌制。您可以通过两种方法解决此问题。根据示例代码,您似乎可以制作format_char一个顶级函数:
namelist
Book
pool.map
Book.namelist
_io.BufferedReader
format_char
def format_char(char): char = char + "a" return char class Book(object): def __init__(self, arg): self.namelist = arg def format_book(self): self.tempread = "" charlist = [f.read() for f in self.namelist] #list of char with Pool() as pool: txtlist = pool.map(format_char,charlist) self.tempread = "".join(txtlist) return self.tempread
但是,实际上,如果您需要format_char成为实例方法,则可以在腌制之前从实例中删除参数,从而使用__getstate__/__setstate__使其可腌制:Book``namelist
__getstate__
__setstate__
Book``namelist
class Book(object): def __init__(self, arg): self.namelist = arg def __getstate__(self): """ This is called before pickling. """ state = self.__dict__.copy() del state['namelist'] return state def __setstate__(self, state): """ This is called while unpickling. """ self.__dict__.update(state) def format_char(self,char): char = char + "a" def format_book(self): self.tempread = "" charlist = [f.read() for f in self.namelist] #list of char with Pool() as pool: txtlist = pool.map(self.format_char,charlist) self.tempread = "".join(txtlist) return self.tempread
只要您不需要namelist在子进程中访问,就可以。