当前位置：首页 > news >正文

Android BLE 稳定连接的关键，不是扫描，而是 GATT 操作队列

news 2026/6/6 10:00:02

很多人第一次写 Android BLE，最先关注的 usually 是扫描。能不能扫到设备，权限有没有配对，UUID 有没有写错。

但 BLE 真正写到项目里以后，问题往往不是出在扫描，而是出在连上之后。

最常见的情况是这样。设备扫到了，也连上了，服务也发现了，看起来前面都没问题。然后你开始：

开通知
写 descriptor
读特征值
写特征值
请求 MTU

这些操作一多，代码就开始变得不稳定。有时候能跑，有时候没回调，有时候状态乱掉，有时候直接来一个133。很多人第一反应是 Android 蓝牙栈不稳定，或者设备兼容性差。这个判断不能说完全错，但很多 BLE 项目的不稳定，根源其实更简单：GATT 操作没有排队。

BLE 在 Android 里最容易被误解的一点，就是它虽然长得像普通 API，但很多操作本质上不是同步调用，也不是你发几个就能并行跑几个。大多数 GATT 操作都应该被当成“单通道串行任务”来看待。前一个没完成，后一个最好别急着发。

这也是为什么很多 demo 看着能跑，项目一复杂就开始飘。demo 里可能只做一件事，比如连上后写一包数据，正好没撞车。但你真实业务里通常不会这么简单。你可能要先discoverServices()，再开通知，再写初始化命令，再等设备回包，再继续下一步。这里面每一步都依赖回调推进，如果你把这些操作当成普通函数一股脑发出去，状态很快就乱了。

最典型的错误写法一般长这样：

gatt.discoverServices()gatt.requestMtu(247)gatt.setCharacteristicNotification(notifyCharacteristic,true)gatt.writeDescriptor(cccdDescriptor)gatt.writeCharacteristic(writeCharacteristic,"hello".toByteArray(),BluetoothGattCharacteristic.WRITE_TYPE_DEFAULT)

这段代码看起来很直接，但问题也很明显。你把一串本来应该按顺序推进的 BLE 操作，当成普通方法调用连续扔了出去。结果通常不是“它们会自己排好队”，而是某一步没回调、某一步失败、某一步被覆盖，最后整个连接状态开始变脏。

更合理的思路，是从一开始就承认一个事实：GATT 操作要排队。

你可以先定义一个操作类型，把所有 BLE 动作都收进统一模型里：

sealedclassBleOperation{dataobjectDiscoverServices:BleOperation()dataclassRequestMtu(valmtu:Int):BleOperation()dataclassWriteDescriptor(valdescriptor:BluetoothGattDescriptor,valvalue:ByteArray):BleOperation()dataclassWriteCharacteristic(valcharacteristic:BluetoothGattCharacteristic,valvalue:ByteArray,valwriteType:Int):BleOperation()dataclassReadCharacteristic(valcharacteristic:BluetoothGattCharacteristic):BleOperation()}

有了这个模型以后，下一步不是马上去调 API，而是先做一个操作队列。

privatevaloperationQueue:ArrayDeque<BleOperation>=ArrayDeque()privatevarcurrentOperation:BleOperation?=null

然后写一个统一的入队方法：

funenqueueOperation(operation:BleOperation){operationQueue.add(operation)if(currentOperation==null){doNextOperation()}}

真正执行的时候，只取队首的一个操作：

privatefundoNextOperation(){valgatt=bluetoothGatt?:returnvaloperation=operationQueue.removeFirstOrNull()?:run{currentOperation=nullreturn}currentOperation=operationwhen(operation){isBleOperation.DiscoverServices->{gatt.discoverServices()}isBleOperation.RequestMtu->{gatt.requestMtu(operation.mtu)}isBleOperation.WriteDescriptor->{operation.descriptor.value=operation.value gatt.writeDescriptor(operation.descriptor)}isBleOperation.WriteCharacteristic->{gatt.writeCharacteristic(operation.characteristic,operation.value,operation.writeType)}isBleOperation.ReadCharacteristic->{gatt.readCharacteristic(operation.characteristic)}}}

这时候整个模型就开始对了。你的思路不再是“我现在想做什么就立刻调什么”，而是“我把操作排进去，等当前操作完成后再推进下一个”。

真正关键的，不在enqueueOperation()，而在回调里怎么把队列往前推。

比如发现服务完成以后：

overridefunonServicesDiscovered(gatt:BluetoothGatt,status:Int){if(status==BluetoothGatt.GATT_SUCCESS){finishCurrentOperation()}else{failCurrentOperation("discoverServices failed:$status")}}

写特征值完成以后：

overridefunonCharacteristicWrite(gatt:BluetoothGatt,characteristic:BluetoothGattCharacteristic,status:Int){if(status==BluetoothGatt.GATT_SUCCESS){finishCurrentOperation()}else{failCurrentOperation("writeCharacteristic failed:$status")}}

写 descriptor 完成以后：

overridefunonDescriptorWrite(gatt:BluetoothGatt,descriptor:BluetoothGattDescriptor,status:Int){if(status==BluetoothGatt.GATT_SUCCESS){finishCurrentOperation()}else{failCurrentOperation("writeDescriptor failed:$status")}}

读特征值完成以后：

overridefunonCharacteristicRead(gatt:BluetoothGatt,characteristic:BluetoothGattCharacteristic,value:ByteArray,status:Int){if(status==BluetoothGatt.GATT_SUCCESS){finishCurrentOperation()}else{failCurrentOperation("readCharacteristic failed:$status")}}

最后把推进逻辑统一收一下：

privatefunfinishCurrentOperation(){currentOperation=nulldoNextOperation()}privatefunfailCurrentOperation(message:String){Log.e("BLE",message)currentOperation=nulldoNextOperation()}

到这里，GATT 队列的骨架就出来了。

这套结构最大的价值，不是“代码更优雅”，而是它把 BLE 通信从一堆零散回调，变成了一个可以推理的流程。你知道当前在做什么，知道后面排了什么，也知道哪个回调应该推进哪个操作。

拿一个很常见的初始化流程来说，很多 BLE 设备连上以后都要先做下面这些事：

发现服务
请求 MTU
开通知
写初始化命令

如果不用队列，代码一般会写得很乱。
如果用队列，流程就会清楚很多：

enqueueOperation(BleOperation.DiscoverServices)enqueueOperation(BleOperation.RequestMtu(247))enqueueOperation(BleOperation.WriteDescriptor(descriptor=cccdDescriptor,value=BluetoothGattDescriptor.ENABLE_NOTIFICATION_VALUE))enqueueOperation(BleOperation.WriteCharacteristic(characteristic=writeCharacteristic,value=byteArrayOf(0xA5.toByte(),0x01,0x00),writeType=BluetoothGattCharacteristic.WRITE_TYPE_DEFAULT))

然后整个初始化链路会一个一个走，不会互相打架。

很多人写 BLE 到中期就开始冒出133，然后觉得是玄学。实际上133的确有系统蓝牙栈的问题，但你自己的状态管理乱了，也很容易把自己送到那种坏状态里。比如：

上一个连接没close()
descriptor 还没写完就开始写特征值
还在扫就开始连续 connect
回调没走完，下一步已经提前发了

这些都不是单个 API 的错，而是整条通信链路没控住。

所以你会发现，BLE 项目越往后，越不像“蓝牙 API 调用”，越像“状态机 + 队列调度”。这也是它真正比普通网络请求难的地方。网络请求很多时候天然就是独立的，BLE 不是。BLE 里很多步骤之间有强顺序依赖，顺序错了，后面就会连锁出问题。

如果你想把这套东西再往工程化推进一步，通常会再加两层。

第一层是超时。因为 BLE 回调不是每次都靠谱，如果某个操作一直不返回，队列就会卡死。所以真实项目里，最好给每个操作加超时控制。比如 5 秒没回调，就认为失败，清理当前操作，继续推进或者直接断开重连。

第二层是状态机。队列解决的是“当前操作怎么串行执行”，状态机解决的是“当前连接阶段允许做什么”。比如未连接时不能发业务包，服务没发现完不能开通知，通知没开完不能进入 ready 状态。这两个东西一起上，BLE 稳定性会明显好很多。

如果只想记一句话，我觉得 BLE 最值得记住的不是某个 API，而是这个判断：

扫描解决的是“找到设备”，GATT 队列解决的是“把连接跑稳”。

很多 BLE 文章喜欢把重点放在扫描过滤、权限申请、设备列表展示，这些当然重要，但它们更多决定的是“你能不能开始”。真正决定 BLE 能不能长期稳定工作的，往往是连上之后你怎么组织 GATT 操作。

所以如果你现在的 BLE 代码已经出现这些症状：

偶尔没回调
偶尔写失败
初始化流程时灵时不灵
重连几次以后越来越不稳定

那与其继续怀疑 UUID、继续试设备，不如先停下来看看你的 GATT 操作是不是还在裸奔。很多时候，问题不在扫描，而在你根本没给 GATT 一个队列。

一个最小可用版本

最后放一个收紧一点的最小骨架，方便你自己抄回项目里改：

classBleOperationQueue(privatevalgattProvider:()->BluetoothGatt?){privatevalqueue=ArrayDeque<BleOperation>()privatevarcurrent:BleOperation?=nullfunenqueue(operation:BleOperation){queue.add(operation)if(current==null){next()}}funonOperationFinished(){current=nullnext()}funonOperationFailed(){current=nullnext()}privatefunnext(){valgatt=gattProvider()?:returnvalop=queue.removeFirstOrNull()?:run{current=nullreturn}current=opwhen(op){isBleOperation.DiscoverServices->gatt.discoverServices()isBleOperation.RequestMtu->gatt.requestMtu(op.mtu)isBleOperation.ReadCharacteristic->gatt.readCharacteristic(op.characteristic)isBleOperation.WriteDescriptor->{op.descriptor.value=op.value gatt.writeDescriptor(op.descriptor)}isBleOperation.WriteCharacteristic->{gatt.writeCharacteristic(op.characteristic,op.value,op.writeType)}}}}