当前位置：首页 > news >正文

基于StructBERT的产品评论情感分析系统搭建教程

news 2026/7/10 2:53:51

基于StructBERT的产品评论情感分析系统搭建教程

1. 引言

你是不是经常看到电商平台上有成千上万条产品评论，却不知道用户到底喜不喜欢这个产品？或者你想快速了解用户对某个新产品的反馈，但手动看评论太费时间？今天我就来教你用StructBERT模型搭建一个智能的情感分析系统，自动帮你分析产品评论是好评还是差评。

这个教程特别适合刚开始接触AI项目的朋友，不需要你有很深的技术背景。我会手把手带你完成从数据准备到结果可视化的全过程，用最简单的代码实现最实用的功能。学完这个教程，你就能自己搭建一个能自动分析评论情感的系统了。

2. 环境准备与快速部署

2.1 安装必要的库

首先我们需要安装几个Python库，打开你的命令行工具，输入以下命令：

pip install modelscope pandas numpy matplotlib seaborn

这些库的作用分别是：

modelscope：用来加载和使用StructBERT模型
pandas：处理和分析数据
numpy：数值计算
matplotlib和seaborn：画图展示结果

2.2 准备数据

情感分析需要一些评论数据来测试，我们可以自己准备一些样例：

import pandas as pd # 创建一些示例评论数据 sample_comments = [ "这个产品质量很好，用起来很舒服", "包装破损了，里面的东西都坏了", "性价比很高，推荐购买", "服务态度很差，再也不买了", "物流很快，第二天就到了", "颜色和图片差别很大，失望" ] # 转换成DataFrame方便处理 comments_df = pd.DataFrame(sample_comments, columns=['comment']) print(comments_df.head())

3. 使用StructBERT进行情感分析

3.1 加载模型

StructBERT是一个专门处理中文情感分析的模型，它已经在11.5万条数据上训练过，能准确判断文本的情感倾向。

from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks # 创建情感分析管道 semantic_cls = pipeline( task=Tasks.text_classification, model='damo/nlp_structbert_sentiment-classification_chinese-base' )

3.2 进行情感分析

现在我们来测试一下模型的效果：

# 测试单条评论 test_comment = "启动的时候很大声音，然后就会听到1.2秒的卡察的声音，类似齿轮摩擦的声音" result = semantic_cls(test_comment) print(f"评论: {test_comment}") print(f"情感分析结果: {result}")

你会看到类似这样的输出：

评论: 启动的时候很大声音，然后就会听到1.2秒的卡察的声音，类似齿轮摩擦的声音 情感分析结果: {'labels': ['负面'], 'scores': [0.98]}

这表示模型以98%的置信度认为这条评论是负面的。

3.3 批量分析评论

实际应用中我们需要分析大量评论，下面是批量处理的代码：

def analyze_comments(comments): results = [] for comment in comments: try: result = semantic_cls(comment) results.append({ 'comment': comment, 'sentiment': result['labels'][0], 'confidence': result['scores'][0] }) except Exception as e: print(f"分析评论时出错: {comment}, 错误: {e}") results.append({ 'comment': comment, 'sentiment': '未知', 'confidence': 0 }) return pd.DataFrame(results) # 分析所有评论 results_df = analyze_comments(sample_comments) print(results_df)

4. 结果存储与分析

4.1 保存分析结果

分析完的数据最好保存下来，方便以后查看：

# 保存到CSV文件 results_df.to_csv('comment_sentiment_analysis.csv', index=False, encoding='utf-8-sig') # 也可以保存到Excel results_df.to_excel('comment_sentiment_analysis.xlsx', index=False)

4.2 简单统计分析

我们来看看分析结果的总体情况：

# 统计正面和负面评论的数量 sentiment_counts = results_df['sentiment'].value_counts() print("情感分布:") print(sentiment_counts) # 计算平均置信度 avg_confidence = results_df['confidence'].mean() print(f"\n平均置信度: {avg_confidence:.2f}")

5. 结果可视化

5.1 制作情感分布图

用图表来展示结果更直观：

import matplotlib.pyplot as plt import seaborn as sns # 设置中文字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签 plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号 # 创建情感分布饼图 plt.figure(figsize=(10, 6)) plt.subplot(1, 2, 1) sentiment_counts.plot.pie(autopct='%1.1f%%', startangle=90) plt.title('评论情感分布') # 创建置信度分布直方图 plt.subplot(1, 2, 2) plt.hist(results_df['confidence'], bins=10, alpha=0.7, color='skyblue') plt.xlabel('置信度') plt.ylabel('数量') plt.title('置信度分布') plt.tight_layout() plt.savefig('sentiment_analysis_results.png', dpi=300, bbox_inches='tight') plt.show()

5.2 生成分析报告

我们还可以生成一个简单的文本报告：

def generate_report(results_df): total_comments = len(results_df) positive_comments = len(results_df[results_df['sentiment'] == '正面']) negative_comments = len(results_df[results_df['sentiment'] == '负面']) positive_ratio = positive_comments / total_comments * 100 negative_ratio = negative_comments / total_comments * 100 report = f""" ===== 情感分析报告 ===== 总评论数: {total_comments} 正面评论: {positive_comments} ({positive_ratio:.1f}%) 负面评论: {negative_comments} ({negative_ratio:.1f}%) 平均置信度: {results_df['confidence'].mean():.2f} 主要问题: """ # 添加一些负面评论的例子 negative_examples = results_df[results_df['sentiment'] == '负面'].head(3) for _, row in negative_examples.iterrows(): report += f"\n- {row['comment']} (置信度: {row['confidence']:.2f})" return report print(generate_report(results_df))

6. 实际应用建议

6.1 处理真实数据

如果你有真实的电商评论数据，可以这样处理：

def analyze_real_data(file_path, comment_column='comment'): """ 分析真实的评论数据文件 file_path: 数据文件路径（CSV或Excel） comment_column: 评论内容所在的列名 """ # 读取数据 if file_path.endswith('.csv'): df = pd.read_csv(file_path) else: df = pd.read_excel(file_path) # 分析情感 results = analyze_comments(df[comment_column].tolist()) # 合并原数据和分析结果 final_df = pd.concat([df, results[['sentiment', 'confidence']]], axis=1) return final_df # 使用示例 # real_results = analyze_real_data('your_comments.csv', 'review_content')

6.2 提高分析准确性

如果发现某些评论分析不准，可以尝试这些方法：

清理数据：去除无关字符、表情符号等
处理长文本：对于很长的评论，可以分段分析
结合规则：对于一些明显的褒义词或贬义词，可以设置规则辅助判断

def preprocess_comment(comment): """简单的评论预处理""" # 去除多余的空格 comment = ' '.join(comment.split()) # 这里可以添加更多的预处理步骤 return comment # 在分析前先预处理评论 preprocessed_comments = [preprocess_comment(comment) for comment in sample_comments]