当前位置：首页 > news >正文

Python语法进阶笔记(八)

news 2026/3/26 21:09:51

一、正则表达式

含义：记录文本规则的代码，字符串处理工具
注意：需要导入re模块
特点：语法比较复杂，可读性较差；通用性很强，适用于多种编程语言
步骤：
1. 导入re模块
2. 使用match 方法进行匹配操作：re.match() 能匹配出以xxx开头的字符串，如果起始位置没有匹配成功，返回None
  re.match(pattern, string, flags=0)
  pattern: 匹配的正则表达式
  string: 要匹配的字符串
  注意：match是匹配字符串开头，匹配不到就没有，且匹配的是表达式的整体
3. 如果男上一步数据匹配成功，使用group()提取数据
```
import re # re.match(pattern, string, flags=0) # pattern : 匹配的正则表达式 # string : 要匹配的字符串 # flags : 匹配标志 s =re.match("星","星月夜星") # 匹配字符串开头 print(s) print(s.group()) # <re.Match object; span=(0, 1), match='星'> # 星
```

二、匹配单个字符

字符	功能
.	匹配任意1个字符（除了\n)
[]	匹配[]中列举的字符
\d	匹配数字，即0-9
\D	匹配非数字，即不是数字
\s	匹配空白，即空格、tab键
\S	匹配非空白
\w	匹配单词字符，即a-z、A-Z、0-9、_
\W	匹配非单词字符

import re res2 = re.match("[he]","hello") print(res2.group()) # h res3=re.match("[1-9]","1234") print(res3.group()) # 1 res4=re.match("[a-zA-Z]","Hello") #a-zA-Z代表列举出所有大小写字母 print(res4.group()) # H # \ d 代表数字 res5=re.match("\d","91234") print(res5.group()) # 9 # \ D 代表非数字 res6=re.match("\D","。a1234") print(res6.group()) # 。 # \ s 代表空白字符/tab键（代表两个空格） res7=re.match("\s."," hello") print(res7.group()) # h # \ S 代表非空白字符 res8=re.match("\S","1hello") print(res8.group()) # 1 # \ w 代表字母数字下划线汉字 res9=re.match("\w","你好_hello") print(res9.group()) # _ # \ W 代表非字母数字下划线 res10=re.match("\W","/hello") print(res10.group()) # /

三、匹配多个字符

字符	功能
*	匹配前一个字符出现 0 次或者无限次，即可有可无
+	匹配前一个字符出现 1 次或者无限次，即至少有 1 次
？	匹配前一个字符出现 1 次或者 0 次，即要么有 1 次，要么没有
{m}	匹配前一个字符出现 m 次
{m,n}	匹配前一个字符出现从 m 到 n 次

import re # * 匹配前一个字符出现 0 次或者无限次，即可有可无 res=re.match("\w*","hello world") res1=re.match(".*","hello world") print(res.group()) # hello print(res1.group()) # hello world # + 匹配前一个字符出现 1 次或者无限次，即至少一次 res2=re.match("\d+","11hello world") print(res2.group()) # 11 # ? 匹配前个字符出现 0 次或者 1 次 res3=re.match("\w?","hello world") print(res3.group()) # h # {m} 匹配前一个字符 m 次 res4=re.match("\w{2}","hello world") print(res4.group()) # he # {m,n} 匹配前一个字符 m 到 n 次,必须符合m<=n res5=re.match("\w{1,9}","hello world") print(res5.group()) # hello

四、匹配开头结尾

字符	功能
^	匹配字符串开头
[^x]	表示匹配非x的字符,取反
$	匹配字符串结尾

import re # ^ 匹配字符串开头；表示对……取反 # 注意： ^在[]中表示不匹配的字符,即[^p]表示匹配非p的字符 res = re.match("^py","python") #以py开头 res1 = re.match("[^p]","python") print(res.group()) # py # print(res1.group()) 报错：AttributeError: 'NoneType' object has no attribute 'group' # $ 匹配字符串结尾 res2 = re.match(".*n$","python") print(res2.group()) # python

五、匹配分组

字符	功能
\|	匹配左右任意一个表达式
(ab)	将括号中字符作为一个分组
\num	引用分组 num 匹配到的字符串
(?P)	分组起别名
(?P=name)	引用别名为 name 分组匹配到的字符串

import re #1. | 匹配左右任意一个表达式 res = re.match("(python|java)","python") res1 = re.match("(.|/d)","123") print(res.group()) print(res1.group()) #2. (abc) 将括号中字符作为一个分组 res2 = re.match("\w*@(163|qq|126).com","123@163.com") print(res2.group()) #3. \num 匹配分组num匹配到的字符串 ----经常在匹配标签时被使用 # res3 = re.match("<(\w*)>\w.*</\\1>","<html>hello world</html>") ---\\转义字符 # res3 = re.match(r"<(\w*)>\w.*</\1>","<html>hello world</html>") # \1表示匹配的分组1，r表示取消转义 res3 = re.match(r"<(\w*)><(\w*)>\w.*</\2></\1>","<html><body>hello world</body></html>") print(res3.group()) #注意：从外到内排序，编号从1开始 #4. (?P<name>) 分组起别名 #5. (?P=name) 引用别名为 name 分组匹配到的字符串 res4 = re.match(r"<(?P<tag>\w*)><(?P<tag2>\w*)>\w.*</(?P=tag2)></(?P=tag)>","<html><body>hello world</body></html>") print(res4.group()) # python # 1 # 123@163.com # <html><body>hello world</body></html> # <html><body>hello world</body></html>

举例：

import re # 匹配网址 前缀一般是www，后缀：.com、.cn、.org等 li = ["www.baidu.com","www.google.com","http.jd.cn","www.python.org"] # res =re.match(r'www(\.)\w*\1(com|cn|org)','www.baidu.com') # print(res.group()) for i in li: res = re.match(r'www(\.)\w*\1(com|cn|org)',i) if res: # print(i) print(res.group()) else: print(f"{i}这个网址格式错误") # www.baidu.com # www.google.com # http.jd.cn这个网址格式错误 # www.python.org

六、高级用法--正则函数

re.match()：从字符串开头匹配，只找第一个符合的单个字符；
re.search()：扫描整个字符串并返回第一个成功匹配的对象，如果匹配失败则返回None;在整个字符串找第一个符合的单个字符（匹配到就停）；

re.findall()：以**列表**形式返回整个字符串中所有匹配到的字符串；

import re res = re.search("python","python hello world") print(res.group()) # re.findall(pattern, string, flags=0) 搜索字符串，返回所有匹配的字符串,返回一个列表 res1 = re.findall("python","python hello world，python") print(res1) # python # ['python', 'python'] # 总结： # match()：从头开始匹配，匹配成功返回match对象，通过group()进行提取，匹配失败就返回None，只匹配一次。 # search()：从头到尾匹配，匹配成功返回第一个成功匹配的对象，通过group()进行提取，匹配失败返回None，只匹配一次。 # findall()：从头到尾匹配，匹配成功返回一个列表，匹配所有匹配成功的数据，不需要通过group()进行提取。

re.sub() : 将匹配到的数据进行替换。
re.sub(pattern, repl, string, count=0, flags=0)
- pattern : 匹配的正则表达式(代表需要被替换的，也就是字符串里面的旧内容）
- repl : 替换的字符串，新内容
- string : 要匹配的字符串
- count : 替换的次数，默认为0，表示替换所有匹配的
```
import re res = re.sub("python","java","python hello world，python") print(res) res1 = re.sub("\d","*","今天是第1天，明天是第2天了",1) print(res1) # java hello world，java # 今天是第*天，明天是第2天了
```

split()：根据匹配进行切割字符串，并返回一个列表。

re.split(pattern, string, maxsplit=0, flags=0)

pattern : 正则表达式分割的符串
string : 要匹配的字符串
maxsplit : 指定最大分割次数，分割的次数，默认为0，表示分割所有匹配的

import re res = re.split("|","python hello world") # | ：匹配左右任意一个表达式 res1 = re.split("o","python hello world") res2 = re.split("o", "python hello world",1) print(res) print(res1) print(res2) # ['', 'p', 'y', 't', 'h', 'o', 'n', ' ', 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ''] # ['pyth', 'n hell', ' w', 'rld'] # ['pyth', 'n hello world']

七、贪婪与非贪婪

贪婪匹配：

在满足匹配条件时，会匹配尽可能长的字符串。
在正则表达式中，这是默认的匹配模式。

非贪婪匹配：

在满足匹配条件时，会匹配尽可能短的字符串。
在正则表达式中，通过在量词（如`*`,`+`,`?`,`{n,}`）后添加一个`?`来表示非贪婪匹配

import re # 贪婪匹配：尽可能多的匹配（默认贪婪匹配） res = re.match("em*","emmmmmmm……") print(res.group()) # 非贪婪匹配：尽可能少的匹配 res1 = re.match("em*?","emmmmmmm……") print(res1.group()) res2=re.match("m{1,5}","mmmmmmm……") print(res2.group()) res3=re.match("m{1,5}?","mmmmmmm……") print(res3.group()) # emmmmmmm # e # mmmmm # m

八、原生字符串

Python中字符串前面加上 r 表示原生字符串

import re print(r"fives\tar") # 取消转义 res = re.match("\\\\","\game") # 正则表达式中，匹配字符串中的字符\,需要\\\\ res1 = re.match(r"\\\\",r"\\game")# 加入原生字符串r，\\代表\ print(res.group()) print(res1.group())

查看全文

http://www.jsqmd.com/news/436445/