我的程序从文件中读取一行。此行包含逗号分隔的文本,例如:
123,test,444,"don't split, this",more test,1
我希望拆分的结果是这样的:
123 test 444 "don't split, this" more test 1
如果使用String.split(","),我将得到:
String.split(",")
123 test 444 "don't split this" more test 1
换句话说:子字符串中的逗号"don't split, this"不是分隔符。该如何处理?
"don't split, this"
你可以尝试以下正则表达式:
str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
这将分割字符串,,后跟偶数双引号。换句话说,它用双引号引起来的逗号分隔。如果你在字符串中使用了引号,则此方法将起作用。
说明:
, // Split on comma (?= // Followed by (?: // Start a non-capture group [^"]* // 0 or more non-quote characters " // 1 quote [^"]* // 0 or more non-quote characters " // 1 quote )* // 0 or more repetition of non-capture group (multiple of 2 quotes will be even) [^"]* // Finally 0 or more non-quotes $ // Till the end (This is necessary, else every comma will satisfy the condition) )
你甚至可以在代码中使用(?x)正则表达式使用修饰符来键入此类内容。修饰符会忽略你的正则表达式中的任何空格,因此更容易读取分成多行的正则表达式,如下所示:
(?x)
String[] arr = str.split("(?x) " + ", " + // Split on comma "(?= " + // Followed by " (?: " + // Start a non-capture group " [^\"]* " + // 0 or more non-quote characters " \" " + // 1 quote " [^\"]* " + // 0 or more non-quote characters " \" " + // 1 quote " )* " + // 0 or more repetition of non-capture group (multiple of 2 quotes will be even) " [^\"]* " + // Finally 0 or more non-quotes " $ " + // Till the end (This is necessary, else every comma will satisfy the condition) ") " // End look-ahead );