当前位置：首页 > news >正文

MICON-Bench Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Mo

news 2026/3/27 6:47:48

MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models

Authors:Mingrui Wu, Hang Liu, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

Deep-Dive Summary:
这份学术论文介绍了一个名为MICON-Bench的基准测试，旨在评估和增强统一多模态模型（UMMs）在多图上下文生成（Multi-Image Context Generation）方面的能力。

| :—: | :—: | :—: | :—: |
| BAGEL | 0.3586 | 0.9155 | 0.8766 | 0.6073 |
|BAGEL + DAR|0.3612|0.9201|0.8828|0.6018|
| OmniGen2 | 0.3646 | 0.9102 | 0.8742 | 0.6373 |
|OmniGen2 + DAR|0.3648|0.9130|0.8757|0.6327|

表 5：参考图像数量对 UMMs 性能的影响。

Model	Ref=2	Ref=3	Ref=4	Ref=5
BAGEL	88.50	84.37	75.11	66.36
OmniGen2	92.18	89.52	74.92	67.00

图 4：DAR 抑制噪声注意力并重新聚焦目标。红色框代表被抑制的无关区域，绿色框代表被增强的目标区域。

6. 结论 (Conclusion)

MICON-Bench 为多图上下文生成提供了严谨的评估平台，而 DAR 机制则提供了一种有效且低成本的手段，来解决当前 UMMs 在跨图像推理中的幻觉问题。这两者共同为开发更可靠的多模态生成系统奠定了基础。

Original Abstract:Recent advancements in Unified Multimodal Models (UMMs) have enabled remarkable image understanding and generation capabilities. However, while models like Gemini-2.5-Flash-Image show emerging abilities to reason over multiple related images, existing benchmarks rarely address the challenges of multi-image context generation, focusing mainly on text-to-image or single-image editing tasks. In this work, we introduce \textbf{MICON-Bench}, a comprehensive benchmark covering six tasks that evaluate cross-image composition, contextual reasoning, and identity preservation. We further propose an MLLM-driven Evaluation-by-Checkpoint framework for automatic verification of semantic and visual consistency, where multimodal large language model (MLLM) serves as a verifier. Additionally, we present \textbf{Dynamic Attention Rebalancing (DAR)}, a training-free, plug-and-play mechanism that dynamically adjusts attention during inference to enhance coherence and reduce hallucinations. Extensive experiments on various state-of-the-art open-source models demonstrate both the rigor of MICON-Bench in exposing multi-image reasoning challenges and the efficacy of DAR in improving generation quality and cross-image coherence. Github: https://github.com/Angusliuuu/MICON-Bench.

PDF Link:2602.19497v1