我正在cat | zgrep远程服务器上运行几个命令,并分别收集其输出以进行进一步处理:
cat | zgrep
class MainProcessor(mp.Process): def __init__(self, peaks_array): super(MainProcessor, self).__init__() self.peaks_array = peaks_array def run(self): for peak_arr in self.peaks_array: peak_processor = PeakProcessor(peak_arr) peak_processor.start() class PeakProcessor(mp.Process): def __init__(self, peak_arr): super(PeakProcessor, self).__init__() self.peak_arr = peak_arr def run(self): command = 'ssh remote_host cat files_to_process | zgrep --mmap "regex" ' log_lines = (subprocess.check_output(command, shell=True)).split('\n') process_data(log_lines)
但是,这会导致顺序执行subprocess(’ssh … cat …’)命令。第二高峰等待第一个完成,依此类推。
如何修改此代码,以使子流程调用并行运行,同时仍能够分别收集每个输出?
另一种方法(而不是将shell进程放在后台的其他建议)是使用多线程。
run您所拥有的方法将执行以下操作:
run
thread.start_new_thread ( myFuncThatDoesZGrep)
要收集结果,您可以执行以下操作:
class MyThread(threading.Thread): def run(self): self.finished = False # Your code to run the command here. blahBlah() # When finished.... self.finished = True self.results = []
如上在多线程链接中所述运行线程。如果您的线程对象具有myThread.finished == True,则可以通过myThread.results收集结果。