如何将大文本文件拆分成行数相等的小文件？

小编典典

如何将大文本文件拆分成行数相等的小文件？

all

我有一个大的（按行数）纯文本文件，我想将其拆分为较小的文件，也按行数。因此，如果我的文件有大约 2M 行，我想将其拆分为 10 个包含 200k
行的文件，或 100 个包含 20k 行的文件（加上一个文件，其余部分；被均匀整除并不重要）。

我可以在 Python 中相当容易地做到这一点，但我想知道是否有任何忍者方法可以使用 Bash 和 Unix
实用程序来做到这一点（而不是手动循环和计算/分区行）。

阅读 118

2022-03-04

共1个答案

小编典典

看一下拆分命令：

$ split --help
Usage: split [OPTION] [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'.  With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   use suffixes of length N (default 2)
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file
  -d, --numeric-suffixes  use numeric suffixes instead of alphabetic
  -l, --lines=NUMBER      put NUMBER lines per output file
      --verbose           print a diagnostic to standard error just
                            before each output file is opened
      --help     display this help and exit
      --version  output version information and exit

你可以这样做：

split -l 200000 filename

这将创建文件，每个文件都有 200000 行，名为xaa xab xac…

另一种选择，按输出文件的大小拆分（仍然在换行符处拆分）：

 split -C 20m --numeric-suffixes input_filename output_prefix

创建output_prefix01 output_prefix02 output_prefix03 ...每个最大大小为 20 兆字节的文件。

2022-03-04