当前位置：首页 > news >正文

Python正则表达式

news 2026/6/23 0:27:04

正则表达式是一种强大的文本处理工具，它允许你在文本中搜索、匹配和处理模式。python中的re模块提供了对正则表达式的支持

正则表达式在线测试

常用3种匹配方式

维度	`re.match()`	`re.search()`	`re.findall()`
匹配起点	强制从索引 0 开始	从任意位置开始	全局扫描
匹配数量	0 或 1	0 或 1	0 到多个
返回类型	Match 对象 /`None`	Match 对象 /`None`	列表（可能为空）
适用场景	验证前缀、格式校验	查找是否存在某模式	批量提取、数据分析
性能	快（无需遍历）	中等（需滑动窗口）	较慢（全量扫描）

importre pattern="apple"text="appleI like apples. apple"match=re.match(pattern,text)search=re.search(pattern,text)findall=re.findall(pattern,text)print(match.group())# 仅从字符串开头匹配：appleprint(search.group())# 和match差不多，但是没限制从开头匹配：appleprint(findall)# 匹配所有的，并返回列表：['apple', 'apple', 'apple']

单字符匹配

`.`匹配任意一个字符

几个.就代表几个字符

importre pattern="."pattern2=".."pattern3="..."text="python"match=re.match(pattern,text)search=re.search(pattern2,text)findall=re.findall(pattern3,text)print(match.group())# pprint(search.group())# pyprint(findall)# 匹配所有的字符串：['pyt', 'hon']

`\d`匹配数字

几个\d就代表几个数字

importre pattern="\d"pattern2="\d\d"text="12python43"match=re.match(pattern,text)search=re.search(pattern2,text)findall=re.findall(pattern2,text)print(match.group())# 1print(search.group())# 12print(findall)# 匹配所有的字符串：['12', '43']

`\D`匹配非数字

几个\D就代表几个非数字

importre pattern="\d"pattern2="\d\D"text="12pytho4n3"match=re.match(pattern,text)search=re.search(pattern2,text)findall=re.findall(pattern2,text)print(match.group())# 1print(search.group())# 2pprint(findall)# 匹配所有的字符串：['2p', '4n']

`\s`匹配特殊字符，如空白、空格、tab等

importre pattern="\s\D"pattern2="\s\d"text=" hello 2p 3ython"match=re.match(pattern,text)search=re.search(pattern2,text)findall=re.findall(pattern2,text)print(match.group())# ' h'print(search.group())# ' 2'print(findall)# 匹配所有的字符串：[' 2', ' 3']

`\S`匹配非空白

importre pattern="\s\S"pattern2="\s\S"text=" hello 2p 3ython"match=re.match(pattern,text)search=re.search(pattern2,text)findall=re.findall(pattern2,text)print(match.group())# ' h'print(search.group())# ' h'print(findall)# 匹配所有的字符串：[' h', ' 2', ' 3']

`\w`匹配单词、字符，如大小写字母、数字、`_`下划线

只能匹配字母、数字、下划线，特殊符号都不行

importre pattern="\w"pattern2="\w\w"text="2_hello 2p 3ython"match=re.match(pattern,text)search=re.search(pattern2,text)findall=re.findall(pattern2,text)print(match.group())# '2'print(search.group())# '2_'print(findall)# 匹配所有的字符串：['2_', 'he', 'll', '2p', '3y', 'th', 'on']

`\W`匹配非单词字符

importre pattern="\W"pattern2="\W\W\W"text="! @_hello(@) 2p 3ython"match=re.match(pattern,text)search=re.search(pattern2,text)findall=re.findall(pattern2,text)print(match.group())# '!'print(search.group())# '! @'print(findall)# 匹配所有的字符串：['! @', '(@)']

`[]`匹配`[]`中列举的字符

只允许出现[]中列举的字符

importre pattern="\W[ah][e]"# 匹配非单词字符+ah任意+e字符pattern2="[helo]"# 匹配helo中的任意一个字符text="@hello(@) 2p 3ython"match=re.match(pattern,text)search=re.search(pattern2,text)findall=re.findall(pattern2,text)print(match.group())# @heprint(search.group())# hprint(findall)# 匹配所有的字符串：['h', 'e', 'l', 'l', 'o', 'h', 'o']

[^2345]不匹配2345中的任意一个
[a-z3-5]匹配a-z或者3-5中的字符

importre pattern="[^2345]"# 不匹配2345pattern2="[^d-h]"# 不匹配d-h字母text="2hello world"match=re.match(pattern,text)search=re.search(pattern,text)findall=re.findall(pattern2,text)print(match)# Noneprint(search.group())# hprint(findall)# 匹配所有的字符串：['2', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l']

表示数量

`*`匹配0次或无数次

importre findall=re.findall("l*","2hello world")print(findall)# ['', '', '', 'll', '', '', '', '', '', 'l', '', '']

`+`匹配1次或无数次

importre findall=re.findall("l+","2hello world")print(findall)# ['ll', 'l']

`?`匹配1次或0次

importre findall=re.findall("l?","2hello world")print(findall)# ['', '', '', 'l', 'l', '', '', '', '', '', 'l', '', '']

`{m}`刚好出现m次

importre findall=re.findall("l{2}","2hello lll lll world")print(findall)# ['ll', 'll', 'll']

`{m,}`至少出现m次

importre findall=re.findall("l{2,}","2hello lll lll world")print(findall)# ['ll', 'lll', 'lll']

`{m,n}`指定从m-n次的范围

importre findall=re.findall("l{2,2}","2hello lll lll world")# {2,2}等价于{2}print(findall)# ['ll', 'll', 'll']

匹配边界

`^`匹配开头字符

importre findall=re.findall("^2","2hello lll lll world")findall2=re.findall("^2h","2hello lll lll world")print(findall)# ['2']print(findall2)# ['2h']

`$`匹配结尾字符

importre findall=re.findall(".*d$","2hello lll lll world")findall2=re.findall(".*d$","2hello lll lll world hh")findall3=re.findall(".*d","2hello lll lll world hh")print(findall)# ['2hello lll lll world']print(findall2)# []print(findall3)# ['2hello lll lll world']

`\b`匹配一个单词的边界

\b：即下面ve的右边不能有字母和数字

importre match1=re.match(r'.*ve\b','ve.2testaabcd')# r消除转义match2=re.match(r'.*ve\b','ve2testaabcd')match3=re.match(r'.*ve\b','veatestaabcd')print(match1.group())# veprint(match2)# Noneprint(match3)# None

`\B`匹配非单词边界

importre match1=re.match(r'.*ve\B','ve.2testaabcd')# 上r消除转义match2=re.match(r'.*ve\B','ve2testaabcd')match3=re.match(r'.*ve\B','veatestaabcd')print(match1)# Noneprint(match2.group())# veprint(match3.group())# ve

匹配分组

`|`匹配左右任意一个表达式

按左右顺序，谁先满足，取谁的，不存都取的情况

importre match1=re.findall(r'\d[1-9]|\d[1-9]\D[a-z]','12ve.2testaabcd')# 因为在python中\代表转义，所以前面加上r消除转义match2=re.match(r'\d[1-9]|\D[a-z]','ve2testaabcd')match3=re.match(r'\d[1-9]|\D[a-z]','32veatestaabcd')print(match1)# ['12']，结果不包含'12ve'，因为两个条件只触发一个print(match2.group())# veprint(match3.group())# 32

`(ab)`将括号中字符作为一个分组

importrematch=re.search(r'(.*\d[1-9])(\D[a-z])','ve23testaabcd')# 由于|的特性。第二个分组没有去执行，但是又有分组存在，所以为Noneprint(match.groups())# ('ve23', 'te')print(match.group())# ve23te group默认参数为0print(match.group(0))# ve23teprint(match.group(1))# ve23print(match.group(2))# teb=re.match(r'<h1>(.*)(<h1>)','<h1>你好啊<h1>')print(b.groups())# 有两括号就分为两个元组元素print(b.group(0))# group(0)是匹配的完整内容：<h1>你好啊<h1>print(b.group(1))# group(1)是匹配的分组1内容：你好啊print(b.group(2))# group(1)是匹配的分组2内容：<h1>****

贪婪与非贪婪

python中使用?号关闭贪婪模式
python里的数量词默认是贪婪的，总是尝试尽可能的匹配更多的字符
比如设置了+，会匹配1次或无数次，默认是匹配无数次。加上?则匹配1次

importre match1=re.match(r"aa\d+","aa2323")# 会尽可能多的去匹配\dmatch2=re.match(r"aa\d+?","aa2323")# 尽可能少的去匹配\dprint(match1.group())# aa2323print(match2.group())# aa2

`re.S`

findall有一个属性re.S

在字符串a中，包含换行符\n，在这种情况下

不使用re.S参数，则只在每一行内进行匹配，如果一行没有，就换下一行重新开始。
使用re.S参数，正则表达式会将这个字符串作为一个整体，在整体中进行匹配。

importre a="""re hello world123"""print(re.findall(r'hello.*123',a))# [] 匹配不到，因为换行了print(re.findall(r'hello.*123',a,re.S))# ['hello\nworld123'] 匹配到了，把整个字符串当成了整体

`re.compile()`

将字符串形式的正则表达式编译为一个可复用的Pattern对象

正则表达式的执行过程通常分为两个阶段：

第一阶段：解析正则字符串，生成内部匹配规则（即“编译”）；
第二阶段：使用该规则对目标文本进行匹配。
当使用 re.search(pattern, text) 等顶层函数时，每次调用都会重复执行第一阶段。而通过 re.compile(pattern)，我们可以只执行一次编译，后续直接使用编译后的对象进行匹配，节省了重复解析的开销。

importreimporttime text="请联系我：13812345678，谢谢！"pattern_str=r'1[3-9]\d{9}'# 正则模式（匹配中国大陆手机号）# 方式一：不使用 compile（每次重新编译）start=time.time()for_inrange(100_000):re.search(pattern_str,text)elapsed_no_compile=time.time()-start# 方式二：使用 compile（只编译一次）compiled_pattern=re.compile(pattern_str)start=time.time()for_inrange(100_000):compiled_pattern.search(text)elapsed_with_compile=time.time()-startprint(f"未预编译耗时:{elapsed_no_compile:.4f}秒")# 未预编译耗时: 0.0540 秒print(f"预编译后耗时:{elapsed_with_compile:.4f}秒")# 预编译后耗时: 0.0210 秒print(f"性能提升:{elapsed_no_compile/elapsed_with_compile:.2f}倍")# 性能提升: 2.57 倍