当前位置：首页 > news >正文

告别命令行恐惧：用Python写个自动摸鱼脚本，定时抓取新闻和基金数据（附源码）

news 2026/7/10 3:02:36

用Python打造智能摸鱼助手：定时抓取新闻与基金数据实战

最近在技术社区看到一个有趣的讨论：程序员如何在工作中高效摸鱼？有人调侃说，真正的高手不是逃避工作，而是用技术让摸鱼变成自我提升。今天我们就来实践一个既实用又有趣的项目——用Python编写自动化摸鱼脚本，定时抓取新闻和基金数据，并通过桌面通知提醒。

这个项目特别适合想要提升Python技能的开发者。通过requests、BeautifulSoup等库的实际应用，你不仅能学到网页抓取技术，还能掌握定时任务和桌面通知的实现方法。最重要的是，整个过程完全合法合规，不会涉及任何敏感操作。

1. 环境准备与基础配置

在开始编写摸鱼脚本前，我们需要搭建好开发环境。推荐使用Python 3.8或更高版本，这个版本在稳定性和新特性支持上达到了很好的平衡。

首先安装必要的依赖库：

pip install requests beautifulsoup4 plyer schedule

这些库各司其职：

requests：用于发送HTTP请求获取网页内容
beautifulsoup4：解析HTML文档，提取我们需要的数据
plyer：实现跨平台的桌面通知功能
schedule：设置定时任务，让脚本自动运行

如果你使用VSCode作为开发工具，可以安装Python扩展来获得更好的开发体验。在扩展市场中搜索"Python"，安装微软官方提供的那个版本即可。

提示：建议创建一个虚拟环境来管理项目依赖，避免污染全局Python环境。可以使用python -m venv myenv命令创建。

2. 新闻抓取功能实现

新闻抓取是我们的核心功能之一。我们将从几个主流新闻网站获取头条新闻，然后提取标题和摘要信息。

2.1 分析新闻网站结构

以某新闻网站为例，我们可以通过浏览器开发者工具查看其HTML结构。按F12打开开发者工具，找到新闻列表所在的HTML元素。通常新闻列表会包含在<div>或<ul>标签中，每个新闻项可能有类似news-item的class。

import requests from bs4 import BeautifulSoup def fetch_news(): url = "https://example-news-site.com/latest" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" } try: response = requests.get(url, headers=headers, timeout=10) response.raise_for_status() soup = BeautifulSoup(response.text, 'html.parser') news_items = [] for item in soup.select('.news-item'): title = item.select_one('.title').text.strip() summary = item.select_one('.summary').text.strip() news_items.append({"title": title, "summary": summary}) return news_items except Exception as e: print(f"获取新闻失败: {e}") return []

2.2 处理反爬机制

许多网站都有反爬虫措施，我们需要做一些处理来避免被封禁：

设置合理的请求头，模拟浏览器访问
控制请求频率，不要过于频繁
使用代理IP（如果需要大量抓取）
处理各种HTTP状态码和异常

def safe_fetch(url, retry=3): for i in range(retry): try: response = requests.get(url, headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)", "Accept-Language": "zh-CN,zh;q=0.9" }, timeout=8) if response.status_code == 200: return response elif response.status_code == 429: time.sleep(10) # 被限流时等待 else: print(f"请求失败，状态码: {response.status_code}") except Exception as e: print(f"请求异常: {e}") time.sleep(2) return None

3. 基金数据抓取与分析

除了新闻，很多开发者还关心自己的基金收益。我们可以从基金平台抓取实时数据，并做简单分析。

3.1 获取基金实时数据

基金数据通常通过API接口提供，我们可以找到这些接口并模拟请求：

def fetch_fund_data(fund_code): api_url = f"https://api.example-fund.com/v3/fund/{fund_code}" try: response = requests.get(api_url, headers={ "Referer": "https://www.example-fund.com/" }, timeout=5) data = response.json() return { "name": data["name"], "current": data["current_price"], "change": data["change_percent"], "time": data["update_time"] } except Exception as e: print(f"获取基金数据失败: {e}") return None

3.2 基金数据分析与提醒

我们可以设置一些条件，当基金涨跌幅达到阈值时发送通知：

from plyer import notification def check_fund_alert(fund_data, threshold=0.03): if not fund_data: return change = float(fund_data["change"].strip('%')) / 100 if abs(change) >= threshold: direction = "上涨" if change > 0 else "下跌" notification.notify( title=f"基金{alert}: {fund_data['name']}", message=f"当前价格: {fund_data['current']}，{direction}{abs(change)*100:.2f}%", timeout=10 )

4. 定时任务与系统集成

为了让脚本自动运行，我们需要设置定时任务。Python的schedule库非常适合这种需求。

4.1 设置定时任务

import schedule import time def job(): print("开始执行摸鱼任务...") news = fetch_news() if news: show_news_notification(news[0]) # 显示最新的一条新闻 fund_data = fetch_fund_data("161725") # 以招商中证白酒为例 if fund_data: check_fund_alert(fund_data) # 每30分钟执行一次 schedule.every(30).minutes.do(job) while True: schedule.run_pending() time.sleep(1)

4.2 系统托盘集成（Windows示例）

对于更高级的集成，我们可以让脚本运行在系统托盘中：

import sys from PyQt5.QtWidgets import QApplication, QSystemTrayIcon, QMenu from PyQt5.QtGui import QIcon app = QApplication(sys.argv) tray = QSystemTrayIcon(QIcon("fish.ico"), parent=app) tray.setToolTip("智能摸鱼助手") menu = QMenu() exit_action = menu.addAction("退出") exit_action.triggered.connect(app.quit) tray.setContextMenu(menu) tray.show() # 在这里启动之前的定时任务循环

5. 高级功能与优化

基础功能实现后，我们可以考虑添加一些增强功能，让摸鱼体验更上一层楼。

5.1 多源新闻聚合

单一新闻源可能不够全面，我们可以整合多个来源：

NEWS_SOURCES = [ { "name": "源A", "url": "https://news-a.com/latest", "title_selector": ".news-title", "summary_selector": ".description" }, { "name": "源B", "url": "https://api.news-b.com/v1/articles", "is_api": True, "title_key": "title", "summary_key": "abstract" } ] def fetch_multi_news(): all_news = [] for source in NEWS_SOURCES: if source.get("is_api"): data = fetch_api_news(source) else: data = fetch_html_news(source) all_news.extend(data) # 按时间排序 return sorted(all_news, key=lambda x: x["time"], reverse=True)[:5]

5.2 数据持久化与历史记录

使用SQLite存储历史数据，方便后续分析：

import sqlite3 from datetime import datetime def init_db(): conn = sqlite3.connect("fish.db") c = conn.cursor() c.execute("""CREATE TABLE IF NOT EXISTS news (id INTEGER PRIMARY KEY AUTOINCREMENT, title TEXT, content TEXT, source TEXT, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP)""") c.execute("""CREATE TABLE IF NOT EXISTS funds (id INTEGER PRIMARY KEY AUTOINCREMENT, code TEXT, name TEXT, price REAL, change REAL, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP)""") conn.commit() conn.close() def save_news(news_item): conn = sqlite3.connect("fish.db") c = conn.cursor() c.execute("INSERT INTO news (title, content, source) VALUES (?, ?, ?)", (news_item["title"], news_item["summary"], "example")) conn.commit() conn.close()

5.3 可视化数据分析

使用matplotlib生成简单的趋势图：

import matplotlib.pyplot as plt import pandas as pd def plot_fund_trend(fund_code, days=7): conn = sqlite3.connect("fish.db") df = pd.read_sql(f"""SELECT date(timestamp) as date, price FROM funds WHERE code = '{fund_code}' ORDER BY timestamp DESC LIMIT {days*24}""", conn) conn.close() if not df.empty: df['date'] = pd.to_datetime(df['date']) daily_avg = df.groupby('date')['price'].mean() plt.figure(figsize=(10, 5)) daily_avg.plot() plt.title(f"基金{fund_code}近{days}日趋势") plt.xlabel("日期") plt.ylabel("价格") plt.grid() plt.savefig("fund_trend.png") plt.close()

6. 安全与优化建议

在开发这类自动化工具时，有几个重要注意事项：

遵守robots.txt：在抓取前检查目标网站的robots.txt文件，尊重网站的爬虫政策
设置合理间隔：请求之间添加适当延迟，避免对服务器造成负担
错误处理：完善异常处理，确保脚本不会因为某个错误而完全停止工作
资源管理：及时关闭数据库连接和文件句柄，避免资源泄漏
用户代理：使用合理的User-Agent标识你的爬虫

# 良好的爬虫实践示例 def responsible_crawler(url): try: # 首先检查robots.txt robots_url = "/".join(url.split("/")[:3]) + "/robots.txt" robots_resp = requests.get(robots_url, timeout=5) if robots_resp.status_code == 200: from urllib.robotparser import RobotFileParser rp = RobotFileParser() rp.parse(robots_resp.text.splitlines()) if not rp.can_fetch("*", url): print(f"根据robots.txt，不允许抓取: {url}") return None # 添加随机延迟 time.sleep(random.uniform(1, 3)) # 设置合理的请求头 headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)", "Accept-Encoding": "gzip, deflate", "Connection": "keep-alive" } response = requests.get(url, headers=headers, timeout=8) response.raise_for_status() return response.text except Exception as e: print(f"抓取过程中出错: {e}") return None

7. 完整代码结构与部署

将所有功能模块整合后，我们的项目结构如下：

smart-fish/ ├── main.py # 主程序入口 ├── config.py # 配置文件 ├── news_crawler.py # 新闻抓取模块 ├── fund_crawler.py # 基金数据模块 ├── notification.py # 通知功能 ├── scheduler.py # 定时任务 ├── db/ # 数据库文件 ├── utils/ # 工具函数 └── requirements.txt # 依赖列表

对于长期运行的脚本，建议使用系统服务方式部署：

Windows: