当前位置：首页 > news >正文

python + word

news 2026/7/12 15:46:54

将一个文件夹的文本合并

电脑中的一个文件夹中的word文件有多个，想要合并这些文件。

代码如下：

import docx import os from docx import Document#创建document对象，即打开一个word 文档 mydocument=Document() myp=mydocument.add_paragraph(" ") os.chdir("E:\【实践】FutureOfMyLife\【博士】中期考核\【★】真题及模拟") subList=os.listdir(os.getcwd())#总的eng文件夹。 for idocx in subList: mydocument.add_heading(f"文章{idocx}",level=0) ifile=docx.Document(f"{idocx}")#使用docx包读取文件 for p in ifile.paragraphs: line=p.text myp.add_run(line)#把段落添加入文件 myp.add_run("\n") mydocument.save("resulteng.docx")

但是，这些文件原有的格式失去了。是以txt字符的形式合并到了docx文件中。

代码需要优化。合并多个.docx文件并保留原始格式（如加粗、字号、空格等）

优化代码如下：

import os from docx import Document from docx.shared import Pt from docx.enum.text import WD_ALIGN_PARAGRAPH def merge_docx_files(folder_path, output_path): merged_doc = Document() for filename in os.listdir(folder_path): if filename.endswith(".docx"): file_path = os.path.join(folder_path, filename) sub_doc = Document(file_path) # 可选：添加文件名作为标题 merged_doc.add_heading(filename, level=1) for para in sub_doc.paragraphs: new_para = merged_doc.add_paragraph() copy_paragraph_format(para, new_para) # 添加分页符（可选） merged_doc.add_page_break() merged_doc.save(output_path) print(f"合并完成，输出文件：{output_path}") def copy_paragraph_format(source_para, target_para): # 复制段落格式 target_para.alignment = source_para.alignment target_para.style = source_para.style for run in source_para.runs: new_run = target_para.add_run(run.text) copy_run_format(run, new_run) def copy_run_format(source_run, target_run): # 复制字体格式 target_run.bold = source_run.bold target_run.italic = source_run.italic target_run.underline = source_run.underline target_run.font.name = source_run.font.name if source_run.font.size: target_run.font.size = source_run.font.size # 使用示例 folder_path = r"E:\【实践】FutureOfMyLife\【博士】中期考核\【★】真题及模拟" output_path = r"E:\【实践】FutureOfMyLife\【博士】中期考核\【★】真题及模拟\合并结果.docx" merge_docx_files(folder_path, output_path)

即可完成。

将多个文件夹的文本合并

首先运用os模块获得多个文本的名称。然后，运用相关包读取这些文件，并汇集在一起。

运用函数安装包：os

将百度网盘中的文件下载在电脑上

以下载后的eng文件夹为例，eng文件夹中包含很多子文件夹。

将多份在百度网盘中的word文件，合并在一起。代码如下：

import docx import os wfile=open("result.txt","w",encoding="utf-8") os.chdir("E:\eng") subList=os.listdir(os.getcwd())#总的eng文件夹。 for inum in subList: os.chdir(f"E:\eng\{inum}")#eng文件夹中的子文件夹。 inumList=os.listdir(os.getcwd()) for idocx in inumList: print(f"文章{idocx}",file=wfile) ifile=docx.Document(f"{idocx}") for p in ifile.paragraphs: line=p.text print(line,file=wfile) print("\n",file=wfile) wfile.close()

解析如下：

第一步，和程序文件放在一起（以test.py为例）。

import os os.chdir("E:\eng") subList=os.listdir(os.getcwd())#总的eng文件夹。 for inum in subList: os.chdir(f"E:\eng\{inum}")#eng文件夹中的子文件夹。 inumList=os.listdir(os.getcwd()) print(inumList)

第二步，打开程序包docx,如果没有安装，就用pip安装

pip install python-docx

然后会显示安装成功。

第三步，调用docx包去读取docx文件。

for inum in subList: os.chdir(f"E:\eng\{inum}")#eng文件夹中的子文件夹。 inumList=os.listdir(os.getcwd()) for idocx in inumList: print(f"文章{idocx}",file=wfile)#便于区分各个文件 ifile=docx.Document(f"{idocx}")#使用docx包读取文件 for p in ifile.paragraphs: line=p.text print(line,file=wfile)#把文件输入wfile文档中 print("\n",file=wfile)

第四步，最后把生成的txt文本，复制粘贴到docx文本中，就可以完成。

当然，可以将代码进一步优化。比如，用docx包生成word，而不是生成txt。如下：

import docx import os from docx import Document#创建document对象，即打开一个word 文档 mydocument=Document() myp=mydocument.add_paragraph(" ") os.chdir("E:\eng") subList=os.listdir(os.getcwd())#总的eng文件夹。 for inum in subList: os.chdir(f"E:\eng\{inum}")#eng文件夹中的子文件夹。 inumList=os.listdir(os.getcwd()) for idocx in inumList: mydocument.add_heading(f"文章{idocx}",level=0) ifile=docx.Document(f"{idocx}")#使用docx包读取文件 for p in ifile.paragraphs: line=p.text myp.add_run(line)#把段落添加入文件 myp.add_run("\n") mydocument.save("resulteng.docx")

但是在实践过程中，这段代码没有成功。优化功能没有实现。之后需要补充docx包相关的知识，然后去解决这个问题。

整理文本目录

利用豆包或者Kimi生成书籍的目录。

将其复制到word中，出现一个问题，各个章是分开的，但是各个小节的标题是合并在一起的。

在python中，如何按照1.1，3.1，20.1，20.2等类似的小节序号，将其分段？

首先尝试用正则表达式。代码如下：

file=open("content.txt","r",encoding="utf-8") data=file.read() file.close() import re def split_by_section(text,section_pattern=r"\d+\.\d+"): """ 按章节号分割字符串 section_pattern: 章节号的正则模式，如 r'\d+\.\d+' 匹配 "20.1", "1.1" 等 """ # 在章节号前分割，保留章节号 pattern=f"({section_pattern})" parts=re.split(pattern,text) result=[] for i in range(1,len(parts),2):#这段代码的作用是将分割后的奇数索引和偶数索引部分重新组合 if i+1<len(parts): content=(parts[i] + parts[i+1]).strip() if content: result.append(content) return result datalines=data.splitlines() for line in datalines: if "第" in line:print(line) sections=split_by_section(line) for m in sections: print(m)

分割特点：re.split()保留分隔符时，分隔符会出现在奇数索引位置（1,3,5...），而内容在偶数索引位置（2,4,6...）。所以写成

for i in range(1,len(parts),2):#这段代码的作用是将分割后的奇数索引和偶数索引部分重新组合 if i+1<len(parts): content=(parts[i] + parts[i+1]).strip()

strip()是 Python 字符串的内置方法，用于去除字符串首尾的空白字符。

即可完成。

一日一画

代码如下：

import turtle turtle.tracer(False) colorlist=['purple','blue','green','red','orange'] edge=6 d=0 k=1 for j in range(1000): for i in range(edge): turtle.fd(k) d+=362/edge turtle.pencolor(colorlist[j%5]) turtle.seth(d) k+=1 turtle.down()

图形如下：

即可完成。

查看全文

http://www.jsqmd.com/news/472675/