当前位置: 首页 > news >正文

Making Mixture-of-Experts Robust: A Dual-Model Strategy for Accuracy and Adversarial Defense

Mixture-of-Experts (MoE) models are a core building block of modern large-scale AI systems — from Vision Transformers to large language models. They scale efficiently by routing inputs to specialized expert networks.

But this modular structure hides a structural vulnerability.

In our ICML 2025 paper:

“Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach”

we ask:

Can we make MoE models adversarially robust without sacrificing standard accuracy?

This post explains our key insight, the proposed method, and why it matters.


🔎 Overview of Our Approach

Below is the overview figure from our paper:

Robust MoE Overview

The framework has three main components:

  1. Diagnose the structural weakness of MoE
  2. Robustify expert networks (RT-ER)
  3. Balance robustness and accuracy via a dual-model with joint training (JTDMoE)

1️⃣ Why Are Mixture-of-Experts Vulnerable?

An MoE model consists of:

  • Multiple expert networks ( f_i(x) )
  • A router ( a_i(x) ) assigning weights
  • A weighted aggregation:

[
F(x) = \sum_{i=1}^{E} a_i(x) f_i(x)
]

MoE works because different experts specialize in different regions of the input space.

However, adversarial perturbations reveal a structural asymmetry.

🔬 Our Key Observation

We evaluate four robustness metrics:

  • SA – Standard Accuracy
  • RA – Robust Accuracy (whole model attack)
  • RA-E – Attack targeting experts
  • RA-R – Attack targeting router

On CIFAR-10:

  • Router robustness (RA-R) > 50%
  • Expert robustness (RA-E) < 4%

This means:

The experts — not the router — are the weak link.

Even if the router behaves well, a fragile expert can collapse the entire model.


2️⃣ Why Standard Adversarial Training Fails

Standard adversarial training solves:

[
\min_\theta \max_{|\delta|\le \epsilon} \ell(F(x+\delta), y)
]

But MoE has a modular structure:

  • Routing decisions may change under perturbation.
  • Some experts remain poorly trained.
  • Robustness becomes uneven across experts.

Empirically, we observe:

  • Training instability.
  • Sudden accuracy drops.
  • Robust accuracy stagnating around 54%.

This motivates a component-level solution.


3️⃣ RT-ER: Robust Training with Expert Robustification

Instead of only optimizing the final MoE output, we explicitly regularize expert behavior.

We propose:

RT-ER (Robust Training with Experts’ Robustification)

[
\mathcal{L}{rob} =
\max
\ell_{CE}(F(x+\delta), y)

  • \beta \cdot \ell_{KL}(f_2(x+\delta), f_2(x))
    ]

Where:

  • ( f_2 ) is the second-top expert based on routing weight.
  • KL divergence enforces output consistency between clean and adversarial inputs.

Why the Second-Top Expert?

When adversarial noise alters routing decisions,
the second expert often becomes active.

If that expert is fragile, robustness collapses.

RT-ER ensures:

  • Experts behave consistently.
  • Routing changes become less harmful.
  • Lipschitz constants decrease.
  • Training becomes stable.

📈 Results

Compared to traditional adversarial training:

  • +16% robust accuracy improvement (CIFAR-10)
  • Significantly more stable training
  • Minimal drop in clean accuracy

Robustness is now distributed across experts.


4️⃣ The Robustness–Accuracy Trade-Off

Even with RT-ER, improving robustness slightly reduces clean accuracy.

So we ask:

Can we balance robustness and accuracy instead of choosing one?


5️⃣ Dual-Model Strategy

We introduce a dual-model framework:

[
F_D(x) = (1 - \alpha) F_S(x) + \alpha F_R(x)
]

Where:

  • ( F_S ) = Standard MoE
  • ( F_R ) = Robust MoE (trained via RT-ER)
  • ( \alpha \in [0.5,1] )

This provides a controllable trade-off:

  • Larger ( \alpha ) → stronger robustness
  • Smaller ( \alpha ) → higher clean accuracy

6️⃣ Theoretical Insight: Certified Robustness Bound

We derive certified robustness bounds for both:

  • Single MoE
  • Dual-model

The bound reveals:

[
\epsilon \propto \frac{\text{classification margin}}{\text{Lipschitz constants}}
]

Implications:

  • Increasing margin improves robustness.
  • Reducing expert Lipschitz constants improves robustness.
  • Robustness fundamentally depends on expert stability.

This provides theoretical support for RT-ER.


7️⃣ JTDMoE: Joint Training for Dual-Model

Instead of training models separately, we propose:

JTDMoE (Joint Training for Dual-Model)

Bi-level optimization:

Lower level:
[
\min_{\Theta_R} \mathcal{L}_{rob}
]

Upper level:
[
\min_{\Theta_S,\Theta_R} \ell_{CE}(F_D(x), y)
]

This aligns:

  • Standard MoE
  • Robust MoE
  • Dual-model margin

🚀 Empirical Gains

Under AutoAttack:

  • +20% robust accuracy improvement (CIFAR-10)
  • Clean accuracy improves
  • Margin increases across all classes

This confirms our theoretical prediction.


🔑 Key Takeaways

If you remember three things:

  1. Experts are the structural vulnerability in MoE.
  2. Targeted expert robustification stabilizes the architecture.
  3. A jointly trained dual-model breaks the robustness–accuracy trade-off.

📎 Paper & Code

📄 ICML 2025 Paper
Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach

💻 Code
https://github.com/TIML-Group/Robust-MoE-Dual-Model


Mixture-of-Experts models are becoming foundational in large-scale AI.

Understanding and strengthening their structural robustness is essential for deploying reliable and safe systems.

If you're working on robust ML, MoE architectures, or scalable AI systems — I would love to discuss.

http://www.jsqmd.com/news/406577/

相关文章:

  • 春晚音质封神!追觅电视大师声学系统,承包春晚全场景听觉体验
  • 从 FWT 到 FFT
  • API调用还是本地部署?LLM使用策略对比
  • 智平方机器人宣布完成10亿融资:估值超百亿 百度与中车是投资方
  • 大数据领域Doris的动态分区管理技巧
  • 洗车店就在家门口 50 米,我问 AI 怎么去,它说“走过去“—— 深入剖析为什么 AI 会集体翻车?
  • python+uniapp微信小程序的文明城市创建平台设计与实现
  • python+uniapp微信小程序的外卖点餐点单系统 商家协同过滤
  • python+uniapp微信小程序的大悦城地下停车场车位预约收费系统_
  • python+uniapp微信小程序的宠物生活服务预约系统 宠物陪玩遛狗溜猫馆设计与实现 商家_
  • vcs启动verdi单步调试功能
  • python+uniapp微信小程序的体育用品羽毛球购物商城
  • python+uniapp微信小程序的汽车线上车辆租赁管理系统的设计与实现_
  • python+uniapp微信小程序的便捷理疗店服务预约系统的研究与实现
  • python+uniapp微信小程序的健康食品零食商城积分兑换的设计与实现_
  • python+uniapp微信小程序的博物馆文创产品推荐商城销售系统
  • python+uniapp微信小程序的教师课堂教学辅助管理系统 人脸识别签到
  • python+uniapp微信小程序 的瑜伽馆课程预约选课管理系统
  • Stremio - 让你畅享视频娱乐的自由媒体中心!
  • FossFLOW:轻松制作美观的等距基础设施图
  • 航空航天晶格结构增材制造:基本分类与特性
  • 20260223 之所思 - 人生如梦
  • 2026品牌设计趋势洞察:6家顶尖服务商深度评估与精选推荐 - 2026年企业推荐榜
  • 物理研究科研AI智能体,AI应用架构师探索宇宙奥秘的可靠支撑
  • 2026年手工地毯工厂综合评测:从源头到设计,谁更值得信赖? - 2026年企业推荐榜
  • 69岁李雪健妻子长相惊艳,儿子长相帅气,高学历令人羡慕!
  • 大数据领域数据可视化的三维展示技术
  • 绿云软件冲刺港股:9个月营收2亿利润3457万 估值25亿
  • 埃斯顿通过上市聆讯:预计2025年扣非后净利600万到800万 吴波家族控制42%股权
  • 为什么芯片工程师在流片之后容易生病?