当前位置：首页 > news >正文

Python实战：用LSTM和逻辑回归预测彩票中奖概率（附完整代码）

news 2026/7/3 3:46:32

Python实战：用LSTM和逻辑回归预测彩票中奖概率（附完整代码）

彩票预测一直是数据科学爱好者感兴趣的话题之一。虽然从数学角度讲，彩票本质上是一个随机事件，但这并不妨碍我们通过机器学习模型来探索其中的规律。本文将带你用Python实现两种经典的预测模型——LSTM和逻辑回归，并附上完整的代码实现。

1. 数据准备与预处理

在开始建模之前，我们需要准备合适的数据集并进行必要的预处理。彩票数据通常包含日期和中奖结果两个关键字段。

import pandas as pd import numpy as np # 读取原始数据 df = pd.read_excel('lottery_data.xlsx') print(df.head()) # 数据清洗 df = df[df['结果'] != '休息日'] # 去除休息日记录 df['结果'] = df['结果'].apply(lambda x: 1 if x == '中奖' else 0) # 转换为0/1编码

关键预处理步骤：

日期格式标准化
异常值处理
特征工程（如添加滞后特征）

注意：在实际应用中，建议对数据进行更详细的探索性分析(EDA)，包括查看数据分布、缺失值情况等。

2. LSTM模型构建与训练

LSTM（长短期记忆网络）特别适合处理时间序列数据，能够捕捉数据中的长期依赖关系。

2.1 数据准备

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense # 准备训练数据 data = df['结果'].values n_steps = 5 # 使用前5期结果预测下一期 X, y = [], [] for i in range(len(data) - n_steps): X.append(data[i:i+n_steps]) y.append(data[i+n_steps]) X = np.array(X).reshape(-1, n_steps, 1) y = np.array(y)

2.2 模型架构

model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(n_steps, 1))) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

2.3 模型训练与评估

history = model.fit(X, y, epochs=100, batch_size=32, validation_split=0.2) # 绘制训练曲线 import matplotlib.pyplot as plt plt.plot(history.history['accuracy'], label='train') plt.plot(history.history['val_accuracy'], label='validation') plt.legend() plt.show()

LSTM调参技巧：

调整时间步长(n_steps)
尝试不同的LSTM单元数量
实验不同的激活函数
优化批次大小和训练轮次

3. 逻辑回归模型实现

逻辑回归虽然简单，但在二分类问题上往往能提供不错的基线性能。

3.1 特征工程

# 创建滞后特征 for i in range(1, 6): df[f'lag_{i}'] = df['结果'].shift(i) df = df.dropna() # 去除包含NaN的行

3.2 模型训练

from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split X = df[['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5']] y = df['结果'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) model = LogisticRegression(max_iter=1000) model.fit(X_train, y_train) print(f"训练集准确率: {model.score(X_train, y_train):.2f}") print(f"测试集准确率: {model.score(X_test, y_test):.2f}")

3.3 模型解释

逻辑回归的一个优势是模型可解释性强。我们可以查看特征系数：

feature_importance = pd.DataFrame({ 'feature': X.columns, 'coefficient': model.coef_[0] }).sort_values('coefficient', ascending=False) print(feature_importance)

4. 模型比较与实战应用

4.1 性能对比

指标	LSTM模型	逻辑回归模型
训练准确率	0.72	0.68
测试准确率	0.65	0.63
训练时间	较长	较短
可解释性	低	高

4.2 实际预测示例

# 使用LSTM预测 last_sequence = data[-n_steps:] # 获取最近n_steps期结果 prediction = model.predict(np.array(last_sequence).reshape(1, n_steps, 1)) print(f"下一期预测中奖概率: {prediction[0][0]:.2f}") # 使用逻辑回归预测 last_results = [df['结果'].iloc[-i] for i in range(1, 6)] lr_pred = model.predict_proba([last_results]) print(f"逻辑回归预测中奖概率: {lr_pred[0][1]:.2f}")

4.3 模型优化建议

特征工程：
- 添加更多统计特征（如移动平均）
- 考虑加入日期相关特征（如星期几）
模型改进：
- 尝试GRU等变体模型
- 使用集成方法提升逻辑回归性能
评估指标：
- 除了准确率，还应关注精确率、召回率
- 考虑使用ROC曲线评估模型

提示：彩票预测本质上仍是随机事件，模型预测结果仅供参考，不应作为实际投注依据。

5. 扩展思路与高级技巧

5.1 集成学习方法

结合LSTM和逻辑回归的优势，可以尝试模型集成：

from sklearn.ensemble import VotingClassifier from tensorflow.keras.wrappers.scikit_learn import KerasClassifier # 创建Keras分类器 def create_lstm_model(): model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(n_steps, 1))) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy') return model lstm_clf = KerasClassifier(build_fn=create_lstm_model, epochs=50, batch_size=32, verbose=0) lr_clf = LogisticRegression(max_iter=1000) ensemble = VotingClassifier(estimators=[ ('lstm', lstm_clf), ('lr', lr_clf) ], voting='soft') ensemble.fit(X_train, y_train)

5.2 概率校准

对于概率预测任务，校准模型输出的概率很重要：

from sklearn.calibration import CalibratedClassifierCV calibrated_lr = CalibratedClassifierCV(lr_clf, method='isotonic', cv=3) calibrated_lr.fit(X_train, y_train)

5.3 自动化模型训练

使用Pipeline简化流程：

from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler pipeline = Pipeline([ ('scaler', StandardScaler()), ('model', LogisticRegression()) ]) pipeline.fit(X_train, y_train)

在实际项目中，我发现特征工程的质量往往比模型选择更重要。通过添加合理的滞后特征和统计特征，即使是简单的逻辑回归模型也能获得不错的预测性能。

查看全文

http://www.jsqmd.com/news/533235/