当前位置：首页 > news >正文

Spring AI + RAG 构建电商智能客服：从 PDF 文档解析到精准问答的全链路实战

news 2026/3/27 3:28:06

Spring AI + RAG 构建电商智能客服：从 PDF 文档解析到精准问答的全链路实战

摘要：本文基于 Spring AI 1.0 + Spring Boot 3.3，从零搭建一套生产级电商客服 RAG（检索增强生成）系统。涵盖 PDF 文档解析、向量数据库选型、双模式检索策略、实体提取、答案溯源等核心技术，附带完整可运行代码与真实电商场景案例。

一、业务痛点：传统客服系统为何"答非所问"？

1.1 真实场景还原

用户提问："双 11 买的口红拆封了，发现颜色不喜欢，能退货吗？" 传统关键词匹配客服： ❌ "亲，您可以在订单页面申请售后哦~" ❌ "退货政策请参考官网说明" ❌ "请问您有什么问题呢？" 用户内心 OS：我就想知道拆封的化妆品到底能不能退！！！

1.2 电商客服核心挑战

挑战类型	具体表现	传统方案局限
文档分散	退货政策、运费规则、活动条款分散在多个 PDF/Word	人工维护知识库，更新滞后
语义理解弱	"拆封能退吗"vs"已使用能退吗"语义相同	关键词匹配无法识别
答案无溯源	客服回答无法标注来源条款	用户不信任，易投诉
业务边界模糊	用户问"今天天气如何"	无法过滤非业务问题

1.3 RAG 架构优势

┌─────────────────────────────────────────────────────────────────┐ │ 用户提问 │ │ "双 11 买的口红拆封能退吗？" │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ① 实体提取 ② 向量检索 ③ 答案生成 │ │ 商品：口红 匹配退货政策 基于条款生成 │ │ 时间：双 11 相似度 Top3 标注来源章节 │ │ 诉求：退货 过滤无关内容 拒绝编造规则 │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 回答：根据《电商退换货政策》第 3.2 条规定： │ │ 化妆品类商品拆封后，如无质量问题，不支持 7 天无理由退货。 │ │ 但如商品存在过敏等质量问题，可提供凭证申请退货。 │ │ │ │ 📄 来源：《电商知识库标准条款》- 第三章 特殊商品退货规则 │ └─────────────────────────────────────────────────────────────────┘

二、技术选型：为什么是 Spring AI？

2.1 技术栈全景

核心框架: - Spring Boot: 3.3.0 - Spring AI: 1.0.0-M1 - Java: 17+ 向量数据库: - PostgreSQL + pgvector (生产推荐) - Redis Stack (轻量级方案) - ChromaDB (快速原型) 文档解析: - Apache PDFBox: PDF 文本提取 - Spring AI Document Readers: 结构化解析 大模型: - 阿里云百炼 (DashScope) - OpenAI GPT-4 - 本地部署：Qwen/ChatGLM

2.2 Spring AI 核心组件

/** * Spring AI RAG 核心组件关系图 * * DocumentReader → DocumentSplitter → VectorStore → RetrievalAugmentationAdvisor → ChatClient * │ │ │ │ │ * PDF 解析 文本分块 向量存储 检索增强策略 大模型对话 */

三、项目搭建：从零开始构建 RAG 系统

3.1 Maven 依赖配置

<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.3.0</version> </parent> <groupId>com.example</groupId> <artifactId>ecommerce-rag-service</artifactId> <version>1.0.0</version> <name>电商智能客服 RAG 系统</name> <properties> <java.version>17</java.version> <spring-ai.version>1.0.0-M1</spring-ai.version> </properties> <dependencies> <!-- Spring Boot 核心 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <!-- Spring AI 核心 --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-model-dashscope</artifactId> <version>${spring-ai.version}</version> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-vector-store-pgvector</artifactId> <version>${spring-ai.version}</version> </dependency> <!-- PDF 解析 --> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>3.0.1</version> </dependency> <!-- 实体提取：LangChain4j 辅助 --> <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j</artifactId> <version>0.35.0</version> </dependency> <!-- 数据库 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> </dependency> <!-- 工具类 --> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> </dependencies> </project>

3.2 应用配置

# application.yml spring: application: name: ecommerce-rag-service ai: dashscope: api-key: ${DASHSCOPE_API_KEY} chat: options: model: qwen-max temperature: 0.7 embedding: options: model: text-embedding-v2 vectorstore: pgvector: index-type: HNSW distance-type: COSINE_DISTANCE dimensions: 1536 # text-embedding-v2 的维度 datasource: url: jdbc:postgresql://localhost:5432/ecommerce_rag username: postgres password: ${DB_PASSWORD} jpa: hibernate: ddl-auto: update # 自定义配置 rag: document: upload-dir: /data/documents supported-types: pdf,docx,txt chunk-size: 500 # 文本分块大小（字符） chunk-overlap: 50 # 分块重叠（减少上下文丢失） retrieval: top-k: 4 # 检索最相似文档数 similarity-threshold: 0.6 # 相似度阈值 business-filter: true # 启用业务过滤器 entity: extraction-enabled: true model: qwen-turbo # 实体提取使用轻量模型

四、核心模块一：PDF 文档解析与向量化

4.1 文档读取器

package com.example.ecommercerag.reader; import lombok.extern.slf4j.Slf4j; import org.apache.pdfbox.Loader; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; import org.springframework.ai.document.Document; import org.springframework.ai.document.Metadata; import org.springframework.stereotype.Component; import java.io.File; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; import java.util.ArrayList; import java.util.List; import java.util.UUID; /** * PDF 文档读取器 * 支持批量解析电商规则文档，提取元数据（章节、页码、文档名） */ @Component @Slf4j public class PdfDocumentReader { /** * 读取单个 PDF 文件 */ public Document read(File pdfFile) throws IOException { log.info("开始解析 PDF 文件：{}", pdfFile.getName()); try (PDDocument document = Loader.loadPDF(pdfFile)) { PDFTextStripper stripper = new PDFTextStripper(); String content = stripper.getText(document); // 构建元数据 Metadata metadata = Metadata.builder() .with("filename", pdfFile.getName()) .with("filepath", pdfFile.getAbsolutePath()) .with("pageCount", document.getNumberOfPages()) .with("fileSize", Files.size(pdfFile.toPath())) .with("documentId", UUID.randomUUID().toString()) .with("documentType", "ecommerce_policy") // 标记为电商政策文档 .build(); return new Document(content, metadata); } } /** * 按章节分割 PDF 文档 * 电商政策文档通常有明确的章节结构，按章节分割能提升检索精度 */ public List<Document> readByChapter(File pdfFile) throws IOException { List<Document> chapters = new ArrayList<>(); try (PDDocument document = Loader.loadPDF(pdfFile)) { PDFTextStripper stripper = new PDFTextStr

查看全文

http://www.jsqmd.com/news/492656/