从python中的大文本文件中删除特定行

小编典典

从python中的大文本文件中删除特定行

python

我有几个大型文本文件，它们的结构都相同，我想删除前三行，然后从第四行中删除非法字符。我不想读取整个数据集然后进行修改，因为每个文件超过100MB，记录超过400万条。

Range   150.0dB -64.9dBm
Mobile unit 1   Base    -17.19968    145.40369  999.8
Fixed unit  2   Mobile  -17.20180    145.29514  533.0
Latitude    Longitude   Rx(dB)  Best unit
-17.06694    145.23158  -050.5  2
-17.06695    145.23297  -044.1  2

因此，应该删除第1,2和3行，并在第4行中，“ Rx（db）”应仅为“
Rx”，并将“最佳单元”更改为“最佳单元”。然后，我可以使用其他脚本对数据进行地理编码。

我不能使用像grep这样的命令行程序都会改变，因此您只需要删除整个1-3行，然后grep或类似代码可以在第4行进行搜索替换。

多谢你们，

===编辑新的pythonic方法来处理@heltonbiker中的较大文件。错误。

import os, re
##infile = arcpy.GetParameter(0)
##chunk_size = arcpy.GetParameter(1) # number of records in each dataset

infile='trc_emerald.txt'
fc= open(infile)
Name = infile[:infile.rfind('.')]
outfile = Name+'_db.txt'

line4 = fc.readlines(100)[3]
line4 = re.sub('\([^\)].*?\)', '', line4)
line4 = re.sub('Best(\s.*?)', 'Best_', line4)
newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]])
fc.close()
newfile = open(outfile, 'w')
newfile.write(newfilestring)
newfile.close()

del lines
del outfile
del Name
#return chunk_size, fl
#arcpy.SetParameterAsText(2, fl)
print "Completed"

追溯（最近一次通话）：文件“ P：\ 2012 \ Job_044_DM_Radio_Propogation \ Working \
FinalPropogation \ TRC_Emerald \ working \
clean_file_1c.py”，第13行，在newfilestring =’‘.join（line4 + [fc行的行。 readlines
[4：]]）TypeError：“ builtin_function_or_method”对象无法下标

阅读 211

2021-01-16

共1个答案

小编典典

正如wim在评论中所说，这sed是正确的工具。以下命令应执行所需的操作：

sed -i -e '4 s/(dB)//' -e '4 s/Best Unit/Best_Unit/' -e '1,3 d' yourfile.whatever

稍微解释一下命令：

-i 在适当位置执行命令，即将输出写回到输入文件中

-e 执行命令

'4 s/(dB)//'在线上4，''代替'(dB)'

'4 s/Best Unit/Best_Unit/' 与上述相同，但查找和替换字符串不同

'1,3 d' 从第1行到第3行（包括第3行）删除整行

sed 是一个非常强大的工具，它不仅可以做很多，而且值得学习。

2021-01-16