当前位置：首页 > news >正文

【Kafka源码解读和使用指南】第80篇：Kafka分区重分配实战——分区负载均衡不再头疼

news 2026/6/16 4:27:02

上一篇【第79篇】Kafka运维手册——Topic管理、分区扩容、动态配置变更完全指南
下一篇【第81篇】Kafka消费积压监控与处理实战——消息堆积是谁的锅（明日更新，敬请期待）

摘要

集群跑了半年，发现Broker1的磁盘用了80%，Broker3才用了30%——分区数据严重不均衡。或者新增了两台Broker，但它们上面空空如也，数据全在老节点上。

这就是分区重分配的用武之地。但问题来了：直接迁移分区会打爆网络、影响线上服务吗？迁移中途服务宕机了怎么办？本文从分区重分配的触发场景讲起，手把手教你使用kafka-reassign-partitions脚本，从JSON生成、迁移执行到全程监控，并给出线上影响评估和流控策略，让你在不停服的情况下完成分区的"乾坤大挪移"。

一、什么情况下需要分区重分配

场景一：新增Broker后

【新增Broker前后的分区分布】 新增前（3个Broker）： Broker 1: P0 P1 P2 P3 P4 (5个Leader分区的负载) Broker 2: P5 P6 P7 P8 P9 (5个) Broker 3: P10 P11 P12 P13 P14 (5个) 新增Broker 4后： Broker 1: P0 P1 P2 P3 P4 ← 负载没变！ Broker 2: P5 P6 P7 P8 P9 ← 负载没变！ Broker 3: P10 P11 P12 P13 P14 ← 负载没变！ Broker 4: (空的) ← 新节点在摸鱼！ → 需要重分配：把部分分区迁移到Broker4

场景二：磁盘使用不均衡

Broker1磁盘: ████████████████░░ 80% ← 快满了！ Broker2磁盘: ████████░░░░░░░░░░ 40% Broker3磁盘: ██████░░░░░░░░░░░░░ 30% → 需要重分配：把Broker1的分区移一部分到Broker2/3

场景三：Broker退役

准备下线Broker 3（机器故障/到期） Broker 3上的所有分区需要迁移到Broker 1、2 → 确保数据在Broker 3下线前已完全迁移

场景四：改善Leader分布

当前Leader分布： Broker 1: 10个Leader（这个节点负载最高） Broker 2: 3个Leader Broker 3: 2个Leader → Leader不均衡，Broker1是瓶颈 → 通过分区重分配+自动Leader再平衡解决

二、分区重分配工具与流程

核心工具

【kafka-reassign-partitions 脚本】 两个核心命令： 1. --generate → 生成迁移计划（JSON文件） 2. --execute → 执行迁移计划 3. --verify → 查看迁移进度 还有其他辅助： --cancel → 取消正在进行的迁移 --bootstrap-server → Kafka地址 --reassignment-json-file → 迁移计划JSON文件 --throttle → 迁移限速（防止打满网络）

完整操作流程

第一步：生成迁移计划

# 创建 topics-to-move.json（列出要迁移的Topic）cat>topics-to-move.json<<'EOF' { "topics": [ {"topic": "orders"}, {"topic": "payments"}, {"topic": "user-events"} ], "version": 1 } EOF# 生成迁移计划kafka-reassign-partitions.sh\--bootstrap-server localhost:9092\--topics-to-move-json-file topics-to-move.json\--broker-list"1,2,3,4"\--generate# 输出包含两部分：# Current partition replica assignment → 当前分布# Proposed partition reassignment configuration → 建议分布

第二步：调整并保存迁移JSON

复制输出的Proposed partition reassignment configuration部分，保存为reassignment.json：

{"version":1,"partitions":[{"topic":"orders","partition":0,"replicas":[2,4,1]},{"topic":"orders","partition":1,"replicas":[4,1,3]},{"topic":"orders","partition":2,"replicas":[1,2,4]},{"topic":"payments","partition":0,"replicas":[3,4,2]},{"topic":"payments","partition":1,"replicas":[4,2,1]}]}

人工检查要点：

每个分区的副本数是否一致
Leader副本是否尽量分散在不同Broker
同一分区副本是否在不同Broker上
磁盘容量较大的Broker是否分配了更多分区

第三步：执行迁移（带限速）

# ⚠️ 风险等级：中高# 先以较低的限速开始，观察无异常后再逐步提高# 执行迁移，限速10MB/s（先保守）kafka-reassign-partitions.sh\--bootstrap-server localhost:9092\--reassignment-json-file reassignment.json\--throttle10485760\--execute# 输出：Successfully started reassignment of partitions.

限速的重要性：不加限速，Kafka会用尽全力迁移数据——可能把内网带宽打满，影响正常的消息生产和消费。

第四步：监控迁移进度

# 查看迁移进度kafka-reassign-partitions.sh\--bootstrap-server localhost:9092\--reassignment-json-file reassignment.json\--verify# 输出示例：# Status of partition reassignment:# Reassignment of partition orders-0 is still in progress# Reassignment of partition orders-1 is completed# Reassignment of partition orders-2 is completed# Reassignment of partition payments-0 is still in progress# 看到 "completed" 表示该分区迁移完成

第五步：迁移完成后的清理

# 动态调整限速（迁移中可随时调）# 观察流量平稳，提高到50MB/skafka-reassign-partitions.sh\--bootstrap-server localhost:9092\--reassignment-json-file reassignment.json\--throttle52428800\--execute

三、迁移过程中的数据一致性保证

【分区迁移的数据复制过程】 分区 orders-0 从 Broker1 迁移到 Broker4： 1. Broker4开始追数据 Broker1: [msg1][msg2][msg3][msg4][msg5]... Broker4: [正在复制，目前到msg2]... 2. Broker4追到msg5，但此时Broker1又收到了msg6、msg7 Broker1: [msg1]...[msg5][msg6][msg7]... Broker4: [msg1]...[msg5] ← 又落后了 3. Broker4继续追，直到差距足够小 Broker1: [msg1]...[msg5][msg6][msg7][msg8] Broker4: [msg1]...[msg5][msg6][msg7] ← 差1条 4. 差距小到阈值 → Broker4加入ISR → 成为Follower → 正常同步，数据不会丢失 5. Controller更新分区Leader/Broker4先当Follower → 后续可通过Preferred Leader Election让Broker4当Leader

关键保证：

迁移过程中，原始数据一直在Broker1上，不会丢失
Broker4同步完后才加入ISR，确保了数据一致性
迁移期间，消息仍然可以被正常生产和消费

四、迁移限速与风险控制

限速策略

【限速分阶段策略】 阶段1: 初始低速（10MB/s） → 观察30分钟，确认无异常 → 监控：生产延迟、消费Lag、网络流量 阶段2: 中速（30-50MB/s） → 观察1小时，确认稳定 → 监控：Broker CPU、磁盘IO、Page Cache命中率 阶段3: 全速（无限制或100MB/s） → 接近完成时加速 → 限速会自动在副本同步完成后解除

危险信号监控

指标	正常值	告警值	应对
生产延迟	< 10ms	> 100ms	降低限速
消费Lag	正常波动	持续增长	降低限速
网络流量	< 60%带宽	> 80%带宽	立即降低限速
Broker CPU	< 60%	> 85%	暂停迁移
ISR缩减	无	出现缩减	检查网络/磁盘

取消迁移

# 如果出现问题，可以安全取消kafka-reassign-partitions.sh\--bootstrap-server localhost:9092\--reassignment-json-file reassignment.json\--cancel

五、自动化分区均衡——Kafka Cruise Control

手动写JSON做分区重分配，几十个Topic还行，几百个就疯了。LinkedIn开源了Cruise Control来解决这个问题。

【Cruise Control 自动化分区均衡】 ┌──────────────────────────────┐ │ Cruise Control │ │ │ Metrics ◄──┤ 负载监控 + 分析引擎 │ from │ │ Brokers │ 目标： │ │ - Leader分布均衡 │ │ - 磁盘使用均衡 │ │ - 网络流量均衡 │ │ - 副本分布合理 │ │ │ │ ┌──────────────────────────┐ │ │ │ 自动生成重分配计划 │ │ │ │ → 执行/审批/回滚 │ │ │ └──────────────────────────┘ │ └──────────────────────────────┘

Cruise Control vs 手动重分配

维度	手动脚本	Cruise Control
适用规模	< 50个Topic	任意规模
优化维度	单一（手动指定）	多维（Leader/磁盘/网络/副本）
执行方式	一次性	✅ 持续优化
回滚	手动	✅ 自动快照回滚
学习成本	低	中（需要部署）
部署	无需	需要独立部署
适用场景	应急/小集群	大规模生产集群

六、Leader再均衡——另一个维度的均衡

分区分布均衡了，但Leader不均衡也会出问题：

# 查看Leader分布kafka-topics.sh--describe--topicorders --bootstrap-server localhost:9092# 如果Leader集中在少数Broker上，需要再平衡# Preferred Leader Election（自动选回首选Leader）kafka-leader-election.sh\--bootstrap-server localhost:9092\--election-type PREFERRED\--topicorders\--partition0# 批量：对所有Topic做Leader再平衡kafka-leader-election.sh\--bootstrap-server localhost:9092\--election-type PREFERRED\--all-topic-partitions

自动Leader再平衡：

# Broker配置：开启自动Leader再平衡 auto.leader.rebalance.enable=true # 允许的最大不均衡比例（超过后自动触发） # 10% = 如果某Broker的Leader数比平均高10%，触发再平衡 leader.imbalance.per.broker.percentage=10 # 检查间隔 leader.imbalance.check.interval.seconds=300