下面的表达式:
^(#ifdef FEATURE)+?\s*$((\r\n.*?)*^(#endif)+\s*[\/\/]*\s*(end of)*\s*FEATURE)+?$
运行编译的.Jar文件时,覆盖匹配的缓冲区。
匹配的字符串可以类似于:
这是一条垃圾线 #ifdef FEATURE #endif // FEATURE的结尾 这是一条垃圾线 #ifdef功能 这是一条应该匹配的垃圾线:HOLasduiqwhei&//功能fjfefj #endif // h #endif功能 这是一条垃圾线
这是一条垃圾线
#ifdef FEATURE #endif // FEATURE的结尾
#ifdef功能
这是一条应该匹配的垃圾线:HOLasduiqwhei&//功能fjfefj #endif // h
#endif功能
因此,粗体字符串应匹配。错误如下:
at java.util.regex.Pattern$GroupHead.match(Unknown Source) at java.util.regex.Pattern$Loop.match(Unknown Source) at java.util.regex.Pattern$GroupTail.match(Unknown Source) at java.util.regex.Pattern$Curly.match1(Unknown Source) at java.util.regex.Pattern$Curly.match(Unknown Source) at java.util.regex.Pattern$Slice.match(Unknown Source) at java.util.regex.Pattern$GroupHead.match(Unknown Source) at java.util.regex.Pattern$Loop.match(Unknown Source) at java.util.regex.Pattern$GroupTail.match(Unknown Source) at java.util.regex.Pattern$Curly.match1(Unknown Source) at java.util.regex.Pattern$Curly.match(Unknown Source) at java.util.regex.Pattern$Slice.match(Unknown Source) at java.util.regex.Pattern$GroupHead.match(Unknown Source) at java.util.regex.Pattern$Loop.match(Unknown Source) at java.util.regex.Pattern$GroupTail.match(Unknown Source) at java.util.regex.Pattern$Curly.match1(Unknown Source) at java.util.regex.Pattern$Curly.match(Unknown Source) at java.util.regex.Pattern$Slice.match(Unknown Source) at java.util.regex.Pattern$GroupHead.match(Unknown Source) at java.util.regex.Pattern$Loop.match(Unknown Source) at java.util.regex.Pattern$GroupTail.match(Unknown Source) at java.util.regex.Pattern$Curly.match1(Unknown Source) at java.util.regex.Pattern$Curly.match(Unknown Source) at java.util.regex.Pattern$Slice.match(Unknown Source) at java.util.regex.Pattern$GroupHead.match(Unknown Source) at java.util.regex.Pattern$Loop.match(Unknown Source) at java.util.regex.Pattern$GroupTail.match(Unknown Source) at java.util.regex.Pattern$Curly.match1(Unknown Source) at java.util.regex.Pattern$Curly.match(Unknown Source) at java.util.regex.Pattern$Slice.match(Unknown Source)
欢迎采取任何回避策略或改进表达方式的方法。(?>)由于某些原因,我尝试了原子组,但并未简化。
(?>)
代码如下:
公共字符串带(字符串文本){
ArrayList<String> patterns=new ArrayList<String>(); patterns=readFile("Disabled_Features.txt"); for(int i = 0; i < patterns.size(); ++i) { Pattern todoPattern = Pattern.compile("^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$",Pattern.MULTILINE); Matcher m = todoPattern.matcher(text); text = m.replaceAll(""); } return text; }
我尝试了@Wiktor编写的代码,效果很好
import java.util.regex.Matcher; import java.util.regex.Pattern; public class TestRegex { public static void main(String[] args) { String text = "this is a junk line\n" + "\n" + "#ifdef FEATURE \n" + "#endif // end of FEATURE\n" + "\n" + "this is a junk line\n" + "\n" + "#ifdef FEATURE\n" + "\n" + "this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h\n" + "\n" + "#endif FEATURE\n" + "\n" + "this is a junk line"; // this version does not use Pattern.MULTILINE, this should reduce the backtraking Matcher matcher2 = Pattern.compile("\\n#ifdef FEATURE((?:\\r?\\n(?!#endif (?:// end of )?FEATURE).*)*)\\r?\\n#endif (?:// end of )?FEATURE").matcher(text); while (matcher2.find()) { System.out.println(matcher2.group()); } } }
这让我认为您的问题是由于输入文件的大小所致。
因此,如果文件太大,则可以将输入实现为CharSequence,从而可以包装大文本文件。为什么?因为Matcher从a 构建a Pattern需要一个CharSequence参数。
CharSequence
Matcher
Pattern
https://github.com/fge/largetext