如何在 Excel 中使用正则表达式并利用 Excel 强大的类似网格的设置进行数据操作?
我知道正则表达式在许多情况下并不理想(使用或不使用正则表达式?),因为 excel 可以使用Left, Mid, Right, Instrtype 命令进行类似的操作。
Left
Mid
Right
Instr
正则表达式用于模式匹配。
要在 Excel 中使用,请执行以下步骤:
第 1 步 :将 VBA 引用添加到“Microsoft VBScript 正则表达式 5.5”
第 2 步 :定义您的模式
基本定义:
- 范围。
-
a-z
0-5
[]完全匹配这些括号内的对象之一。
[]
[a]
[abc]
[a-z]
()为返回目的对不同的匹配进行分组。请参阅下面的示例。
()
{}在它之前定义的模式的重复副本的乘数。
{}
[a]{2}
aa
[a]{1,3}
a
aaa
+ 匹配至少一个或多个在它之前定义的模式。
+
a+
? 匹配零个或一个在它之前定义的模式。
?
[a-z]?
* 匹配零个或多个在它之前定义的模式。
*
[a-z]*
. 匹配除换行符以外的任何字符\n
.
\n
a.
| 或运算符
|
a|b
b
red|white|orange
^ 非运算符
^
[^0-9]
[^aA]
A
\ 转义后面的特殊字符(覆盖上述行为)
\
\.
\\
\(
\?
\$
\^
锚定模式:
^ 匹配必须发生在字符串的开头
^a
^[0-9]
$ 匹配必须出现在字符串的末尾
$
a$
优先表:
Order Name Representation 1 Parentheses ( ) 2 Multipliers ? + * {m,n} {m, n}? 3 Sequence & Anchors abc ^ $ 4 Alternation |
预定义的字符缩写:
abr same as meaning \d [0-9] Any single digit \D [^0-9] Any single character that's not a digit \w [a-zA-Z0-9_] Any word character \W [^a-zA-Z0-9_] Any non-word character \s [ \r\t\n\f] Any space character \S [^ \r\t\n\f] Any non-space character \n [\n] New line
示例 1 : 作为宏运行
下面的示例宏查看单元格中的值A1以查看前 1 或 2 个字符是否为数字。如果是这样,它们将被删除并显示字符串的其余部分。如果没有,则会出现一个框,告诉您未找到匹配项。单元格A1值12abc将返回abc,值1abc将返回abc,值abc123将返回“不匹配”,因为数字不在字符串的开头。
A1
12abc
abc
1abc
abc123
Private Sub simpleRegex() Dim strPattern As String: strPattern = "^[0-9]{1,2}" Dim strReplace As String: strReplace = "" Dim regEx As New RegExp Dim strInput As String Dim Myrange As Range Set Myrange = ActiveSheet.Range("A1") If strPattern <> "" Then strInput = Myrange.Value With regEx .Global = True .MultiLine = True .IgnoreCase = False .Pattern = strPattern End With If regEx.Test(strInput) Then MsgBox (regEx.Replace(strInput, strReplace)) Else MsgBox ("Not matched") End If End If End Sub
示例 2 : 作为单元内函数运行
此示例与示例 1 相同,但设置为作为单元内函数运行。要使用,请将代码更改为:
Function simpleCellRegex(Myrange As Range) As String Dim regEx As New RegExp Dim strPattern As String Dim strInput As String Dim strReplace As String Dim strOutput As String strPattern = "^[0-9]{1,3}" If strPattern <> "" Then strInput = Myrange.Value strReplace = "" With regEx .Global = True .MultiLine = True .IgnoreCase = False .Pattern = strPattern End With If regEx.test(strInput) Then simpleCellRegex = regEx.Replace(strInput, strReplace) Else simpleCellRegex = "Not matched" End If End If End Function
将您的字符串(“12abc”)放在 cell 中A1。=simpleCellRegex(A1)在单元格中输入此公式,B1结果将为“abc”。
=simpleCellRegex(A1)
B1
示例 3 : 循环范围
此示例与示例 1 相同,但循环遍历一系列单元格。
Private Sub simpleRegex() Dim strPattern As String: strPattern = "^[0-9]{1,2}" Dim strReplace As String: strReplace = "" Dim regEx As New RegExp Dim strInput As String Dim Myrange As Range Set Myrange = ActiveSheet.Range("A1:A5") For Each cell In Myrange If strPattern <> "" Then strInput = cell.Value With regEx .Global = True .MultiLine = True .IgnoreCase = False .Pattern = strPattern End With If regEx.Test(strInput) Then MsgBox (regEx.Replace(strInput, strReplace)) Else MsgBox ("Not matched") End If End If Next End Sub
示例 4 :拆分不同的模式
此示例循环遍历一个范围 ( A1, A2& A3) 并查找以三个数字开头的字符串,后跟一个字母字符,然后是 4 个数字。输出使用 . 将模式匹配拆分为相邻的单元格()。 $1表示在第一组 中匹配的第一个模式()。
A2
A3
$1
Private Sub splitUpRegexPattern() Dim regEx As New RegExp Dim strPattern As String Dim strInput As String Dim Myrange As Range Set Myrange = ActiveSheet.Range("A1:A3") For Each C In Myrange strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})" If strPattern <> "" Then strInput = C.Value With regEx .Global = True .MultiLine = True .IgnoreCase = False .Pattern = strPattern End With If regEx.test(strInput) Then C.Offset(0, 1) = regEx.Replace(strInput, "$1") C.Offset(0, 2) = regEx.Replace(strInput, "$2") C.Offset(0, 3) = regEx.Replace(strInput, "$3") Else C.Offset(0, 1) = "(Not matched)" End If End If Next End Sub
结果:
其他模式示例
String Regex Pattern Explanation a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single digit, three alpha characters a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May or may not have preceding alpha character a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3} Single alpha, single digit, 0 to 3 alpha characters a1aaa [a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha characters </i8> \<\/[a-zA-Z][0-9]\> Exact non-word character except any single alpha followed by any single digit