考虑到我的另一个问题,我决定我什至不能创建一个匹配罗马数字的正则表达式(更不用说生成它们的上下文无关语法了)
问题是只匹配有效的罗马数字。例如,990 不是“XM”,而是“CMXC”
我为此制作正则表达式的问题是,为了允许或不允许某些字符,我需要回顾一下。让我们以成千上万的例子为例。
我可以允许 M{0,2}C?M(允许 900、1000、1900、2000、2900 和 3000)。但是,如果匹配在 CM 上,我不能允许后面的字符是 C 或 D(因为我已经是 900 了)。
如何在正则表达式中表达这一点? 如果它在正则表达式中根本无法表达,那么它是否可以在上下文无关语法中表达?
您可以为此使用以下正则表达式:
^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$
将其分解,M{0,4}指定千部分并基本上将其限制在0和之间4000。这是一个相对简单的:
M{0,4}
0
4000
0: <empty> matched by M{0} 1000: M matched by M{1} 2000: MM matched by M{2} 3000: MMM matched by M{3} 4000: MMMM matched by M{4}
当然,如果你想允许更大的数字,你可以使用类似M*允许 任何数字(包括零)的东西。
M*
接下来是(CM|CD|D?C{0,3}),稍微复杂一点,这是针对数百个部分的,涵盖了所有可能性:
(CM|CD|D?C{0,3})
0: <empty> matched by D?C{0} (with D not there) 100: C matched by D?C{1} (with D not there) 200: CC matched by D?C{2} (with D not there) 300: CCC matched by D?C{3} (with D not there) 400: CD matched by CD 500: D matched by D?C{0} (with D there) 600: DC matched by D?C{1} (with D there) 700: DCC matched by D?C{2} (with D there) 800: DCCC matched by D?C{3} (with D there) 900: CM matched by CM
第三,(XC|XL|L?X{0,3})遵循与上一节相同的规则,但对于十位:
(XC|XL|L?X{0,3})
0: <empty> matched by L?X{0} (with L not there) 10: X matched by L?X{1} (with L not there) 20: XX matched by L?X{2} (with L not there) 30: XXX matched by L?X{3} (with L not there) 40: XL matched by XL 50: L matched by L?X{0} (with L there) 60: LX matched by L?X{1} (with L there) 70: LXX matched by L?X{2} (with L there) 80: LXXX matched by L?X{3} (with L there) 90: XC matched by XC
最后,(IX|IV|V?I{0,3})是单位部分,处理0方式9与前两个部分类似(罗马数字,尽管看起来很奇怪,但一旦你弄清楚它们是什么,就遵循一些逻辑规则):
(IX|IV|V?I{0,3})
9
0: <empty> matched by V?I{0} (with V not there) 1: I matched by V?I{1} (with V not there) 2: II matched by V?I{2} (with V not there) 3: III matched by V?I{3} (with V not there) 4: IV matched by IV 5: V matched by V?I{0} (with V there) 6: VI matched by V?I{1} (with V there) 7: VII matched by V?I{2} (with V there) 8: VIII matched by V?I{3} (with V there) 9: IX matched by IX
请记住,该正则表达式也将匹配一个空字符串。如果您不想要这个(并且您的正则表达式引擎足够现代),您可以使用积极的后视和前瞻:
(?<=^)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=$)
(另一种选择是预先检查长度是否不为零)。