@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology

As Vision-Language Models (VLMs) advance, human-centered Assistive Technologies (ATs) for helping People with Visual Impairments (PVIs) are evolving into generalists, capable of performing multiple tasks simultaneously. However, benchmarking VLMs for ATs remains under-explored. To bridge this gap, we first create a novel AT benchmark (@Bench). Guided by a pre-design user study with PVIs, our benchmark includes the five most crucial vision-language tasks: Panoptic Segmentation, Depth Estimation, Optical Character Recognition (OCR), Image Captioning, and Visual Question Answering (VQA). Besides, we propose a novel AT model (@Model) that addresses all tasks simultaneously and can be expanded to more assistive functions for helping PVIs. Our framework exhibits outstanding performance across tasks by integrating multi-modal information, and it offers PVIs a more comprehensive assistance. Extensive experiments prove the effectiveness and generalizability of our framework.

翻译：随着视觉语言模型（VLMs）的发展，用于帮助视障人士（PVIs）的以人为中心的辅助技术（ATs）正演变为能够同时执行多项任务的通用系统。然而，针对辅助技术的视觉语言模型基准测试仍处于探索不足的状态。为弥补这一差距，我们首先创建了一个新颖的辅助技术基准测试（@Bench）。基于一项与视障人士进行的预设计用户研究的指导，我们的基准测试包含了五个最关键的视觉语言任务：全景分割、深度估计、光学字符识别（OCR）、图像描述生成以及视觉问答（VQA）。此外，我们提出了一种新颖的辅助技术模型（@Model），该模型能够同时处理所有任务，并可扩展至更多辅助功能以帮助视障人士。我们的框架通过整合多模态信息，在各个任务上均展现出卓越的性能，并为视障人士提供了更全面的辅助。大量实验证明了我们框架的有效性和泛化能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【CVPR 2022】基于分层解析胶囊网络的无监督人脸部分发现，HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network

专知会员服务

10+阅读 · 2022年3月12日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日