当前位置: 首页 > news >正文

VISTA-Bench Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

Authors:Qing’an Liu, Juntong Feng, Yuhao Wang, Xinzhe Han, Yujie Cheng, Yue Zhu, Haiwen Diao, Yunzhi Zhuge, Huchuan Lu

Deep-Dive Summary:
Error: PDF not downloaded. Cannot generate detailed summary.

Original Abstract:Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text queries. In real-world scenarios, language also frequently appears as visualized text embedded in images, raising the question of whether current VLMs handle such input requests comparably. We introduce VISTA-Bench, a systematic benchmark from multimodal perception, reasoning, to unimodal understanding domains. It evaluates visualized text understanding by contrasting pure-text and visualized-text questions under controlled rendering conditions. Extensive evaluation of over 20 representative VLMs reveals a pronounced modality gap: models that perform well on pure-text queries often degrade substantially when equivalent semantic content is presented as visualized text. This gap is further amplified by increased perceptual difficulty, highlighting sensitivity to rendering variations despite unchanged semantics. Overall, VISTA-Bench provides a principled evaluation framework to diagnose this limitation and to guide progress toward more unified language representations across tokenized text and pixels. The source dataset is available at https://github.com/QingAnLiu/VISTA-Bench.

PDF Link:2602.04802v1

部分平台可能图片显示异常,请以我的博客内容为准

http://www.jsqmd.com/news/346179/

相关文章:

  • 雅安市英语雅思培训机构推荐、2026权威测评出国雅思辅导机构口碑榜单 - 老周说教育
  • 运维为什么需要“懂这么多”?
  • 数眼智能大模型API实战:从接入到落地的全流程指南
  • 巴中市英语雅思培训机构推荐,2026权威测评出国雅思辅导机构口碑榜单 - 老周说教育
  • 如何选择适合自己企业的工业智能体解决方案?
  • 巴中市英语雅思培训机构推荐 2026权威测评出国雅思辅导机构口碑榜单 - 老周说教育
  • 【课程设计/毕业设计】基于Python+Flask+Vue的电商管理系统基于Flask和Vue的电商管理系统【附源码、数据库、万字文档】
  • 2026年南京评价高的突发环境事件应急预案技术服务,立项报告技术服务,安全现状评价报告技术服务公司实力优选榜 - 品牌鉴赏师
  • 广西海岸线长度,竟然比江苏加河北的还要长
  • Agent Skills 完全指南:从原理到实战彻底搞懂!
  • CTF Web 专项:XSS 跨站脚本攻击快速入门
  • Claude Skills|将 Agent 变为领域专家
  • Java基于Spring Boot+Vue的智慧停车平台
  • 2026年2月成都市政管道、波纹管、PE钢丝骨架管、骨架管、缠绕管厂家综合分析报告 - 2026年企业推荐榜
  • 【收藏】为什么学网络安全难就业?揭秘480万人才缺口背后的真相
  • 巴中市英语雅思培训机构推荐2026权威测评出国雅思辅导机构口碑榜单 - 老周说教育
  • Java/Python/Go 实现企微外部群自动化消息推送
  • Linux 下 malloc 内存分配机制详解
  • 智能投顾的法律规制探析
  • day77(2.5)——leetcode面试经典150
  • 实测Agent Skills,一次编写,全网通用
  • Language Models Struggle to Use Representations Learned In-Context
  • Python毕设项目:基于Flask和Vue的电商管理系统(源码+文档,讲解、调试运行,定制等)
  • 如何优雅地实现企微外部群消息自动化(Java/Python/Go 多语言版)
  • 4 档拾音 + 双模式接入!AU-48 双麦语音模组让音频设备研发少走 99% 弯路
  • 同样是技术岗,为啥程序员怕35岁危机,网安却越老越值钱?
  • 智能数字式温度报警系统设计
  • Agent Skills技术到底是什么,一个动画彻底搞懂!
  • 收藏必备|RAG系统意图识别详解(小白程序员入门必看)
  • Python计算机毕设之基于Flask和Vue的电商管理系统商品管理、订单处理、用户运营与数据统计(完整前后端代码+说明文档+LW,调试定制等)