别再死记硬背Nash均衡了!用Python模拟‘囚徒困境’和‘性别战’,5分钟搞懂博弈论核心
用Python玩转博弈论:5个经典模型的可视化实践
博弈论常被视为经济学中的"数学黑箱",那些抽象的收益矩阵和均衡概念让初学者望而生畏。但当我第一次用Python模拟出"囚徒困境"中双方策略的演化过程时,那些二维表格突然变得鲜活起来——屏幕上跳动的折线图完美展现了背叛策略如何从偶然选择变成必然结果。这就是编程赋予博弈论学习的魔力:将静态理论转化为动态实验。
1. 环境配置与基础工具
在开始构建博弈模型前,需要配置合适的Python环境。推荐使用Anaconda创建独立环境:
conda create -n game_theory python=3.8 conda activate game_theory pip install numpy matplotlib seaborn核心工具链选择:
- NumPy:处理博弈矩阵运算
- Matplotlib:基础可视化
- Seaborn:美化统计图表
提示:Jupyter Notebook是理想的实验环境,支持代码分段执行和即时可视化
博弈模型的基本要素可以用Python类抽象表示:
class NormalFormGame: def __init__(self, players, strategies, payoff_matrix): self.players = players # 参与者列表 self.strategies = strategies # 策略字典{player: [strategy1, ...]} self.payoff = payoff_matrix # 三维数组[player][row_strat][col_strat]2. 囚徒困境:背叛的必然性
这个经典模型完美展示了个人理性如何导致集体非最优结果。我们先定义收益矩阵:
| 对方合作 | 对方背叛 | |
|---|---|---|
| 我方合作 | (-1,-1) | (-3, 0) |
| 我方背叛 | (0,-3) | (-2,-2) |
用Python实现策略模拟:
def prisoner_dilemma(rounds=100): # 初始化策略选择历史 history = {'A': [], 'B': []} for _ in range(rounds): # 简单策略:对方上轮合作则本轮合作,否则背叛 a_move = 'Cooperate' if (not history['B'] or history['B'][-1] == 'Cooperate') else 'Defect' b_move = 'Cooperate' if (not history['A'] or history['A'][-1] == 'Cooperate') else 'Defect' history['A'].append(a_move) history['B'].append(b_move) return history可视化结果时,可以观察到背叛策略如何迅速占据主导:
import matplotlib.pyplot as plt results = prisoner_dilemma() plt.plot(range(100), [1 if x == 'Defect' else 0 for x in results['A']], label='Player A') plt.plot(range(100), [1 if x == 'Defect' else 0 for x in results['B']], label='Player B') plt.ylabel('Defect=1, Cooperate=0') plt.title('Prisoner\'s Dilemma Strategy Evolution') plt.legend()3. 性别之战:协调博弈的趣味
与囚徒困境不同,性别战模型展示了另一种博弈结构——存在多个纳什均衡。典型收益矩阵:
| 足球 | 歌剧 | |
|---|---|---|
| 足球 | (3,2) | (0,0) |
| 歌剧 | (0,0) | (2,3) |
实现混合策略均衡计算:
def battle_of_sexes(): # 计算混合策略纳什均衡 from scipy.optimize import fsolve def equations(p): q, r = p # 玩家1无差异条件 eq1 = 3*r - 0*(1-r) - (0*r + 2*(1-r)) # 玩家2无差异条件 eq2 = 2*q - 0*(1-q) - (0*q + 3*(1-q)) return [eq1, eq2] q, r = fsolve(equations, (0.5, 0.5)) return {'Player1_Football': q, 'Player2_Opera': r}可视化均衡点分布:
equilibrium = battle_of_sexes() plt.scatter([0,1,0,1], [0,0,1,1], c='gray', label='Pure Strategies') plt.scatter(equilibrium['Player1_Football'], equilibrium['Player2_Opera'], c='red', s=100, label='Mixed Nash Equilibrium') plt.xlabel('P(Football) for Player 1') plt.ylabel('P(Opera) for Player 2') plt.title('Battle of Sexes Strategy Space') plt.legend()4. 鹰鸽博弈:演化稳定策略
这个模型解释了冲突中攻击与退让策略的演化。收益矩阵设定:
| 鹰派 | 鸽派 | |
|---|---|---|
| 鹰派 | (V-C)/2 | V |
| 鸽派 | 0 | V/2 |
实现种群策略动态模拟:
def hawk_dove_simulation(generations=100, initial_pop=[50,50], V=4, C=2): population = initial_pop.copy() history = [] for _ in range(generations): hawk_payoff = (population[0]*(V-C)/2 + population[1]*V)/sum(population) dove_payoff = (population[0]*0 + population[1]*V/2)/sum(population) avg_payoff = (hawk_payoff*population[0] + dove_payoff*population[1])/sum(population) # 复制者动态方程 new_hawks = population[0] * (hawk_payoff / avg_payoff) new_doves = population[1] * (dove_payoff / avg_payoff) population = [new_hawks, new_doves] history.append(population.copy()) return history绘制策略比例变化曲线:
history = hawk_dove_simulation() hawks = [x[0] for x in history] doves = [x[1] for x in history] plt.stackplot(range(100), hawks, doves, labels=['Hawks', 'Doves'], colors=['firebrick', 'steelblue']) plt.legend(loc='upper right') plt.title('Evolution of Hawk-Dove Population')5. 公共品博弈:集体行动的困境
模拟n人参与的公共品投资问题,展示搭便车现象:
def public_goods_game(players=5, rounds=10, endowment=10, multiplier=2): contributions = {i: [] for i in range(players)} for _ in range(rounds): # 每个玩家随机贡献0到endowment current_round = [random.randint(0, endowment) for _ in range(players)] total = sum(current_round) # 计算收益 payoff = [endowment - c + (total * multiplier / players) for c in current_round] for i in range(players): contributions[i].append((current_round[i], payoff[i])) return contributions分析贡献与收益关系:
results = public_goods_game() plt.figure(figsize=(10,6)) for player in results: x = [x[0] for x in results[player]] # 贡献值 y = [x[1] for x in results[player]] # 收益值 plt.scatter(x, y, label=f'Player {player+1}') plt.plot([0,10], [10,10], 'k--', label='Free Rider Payoff') plt.xlabel('Individual Contribution') plt.ylabel('Individual Payoff') plt.title('Public Goods Game Outcomes') plt.legend()6. 进阶实验:自定义博弈分析框架
构建一个可扩展的博弈模拟器:
class GameSimulator: def __init__(self, payoff_matrices): self.payoffs = payoff_matrices self.history = [] def add_strategy(self, strategy_func): """添加自定义策略函数""" self.strategies.append(strategy_func) def run(self, iterations=100): for _ in range(iterations): moves = [strat(self.history) for strat in self.strategies] outcomes = [self.payoffs[i][moves] for i in range(len(self.strategies))] self.history.append((moves, outcomes)) return self.history # 示例:针锋相对策略 def tit_for_tat(history): if not history: return 'Cooperate' last_moves = history[-1][0] return last_moves[1] if random.random() > 0.1 else 'Cooperate' # 添加10%的噪声可视化不同策略的长期表现:
strategies = [tit_for_tat, random_strategy, always_defect] simulator = GameSimulator(prisoner_dilemma_matrix) for strat in strategies: simulator.add_strategy(strat) results = simulator.run() plot_strategy_performance(results)在完成这些实验后,我发现最有效的学习方式是将理论预测与模拟结果进行对比。比如在囚徒困境中,理论预测双方都会选择背叛,而模拟实验则直观展示了这一过程如何动态发生。当修改参数(如增加重复互动次数)时,可以观察到合作策略可能出现的条件,这比单纯记忆结论要有趣得多。
