以我之言：探索盲人用户在对话式视觉问答中的控制机制 (Say It My Way: Exploring Control in Conversational Visual Question Answering with Blind Users) - 专知论文

会员服务 ·

0

问答 · 交互 · 系统 · 视觉问答 · 工具 ·

Say It My Way: Exploring Control in Conversational Visual Question Answering with Blind Users

翻译：以我之言：探索盲人用户在对话式视觉问答中的控制机制

Farnaz Zamiri Zeraati,Yang Trista Cao,Yuehan Qiao,Hal Daumé,Hernisa Kacorri

from arxiv, Preprint, Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems

Prompting and steering techniques are well established in general-purpose generative AI, yet assistive visual question answering (VQA) tools for blind users still follow rigid interaction patterns with limited opportunities for customization. User control can be helpful when system responses are misaligned with their goals and contexts, a gap that becomes especially consequential for blind users that may rely on these systems for access. We invite 11 blind users to customize their interactions with a real-world conversational VQA system. Drawing on 418 interactions, reflections, and post-study interviews, we analyze prompting-based techniques participants adopted, including those introduced in the study and those developed independently in real-world settings. VQA interactions were often lengthy: participants averaged 3 turns, sometimes up to 21, with input text typically tenfold shorter than the responses they heard. Built on state-of-the-art LLMs, the system lacked verbosity controls, was limited in estimating distance in space and time, relied on inaccessible image framing, and offered little to no camera guidance. We discuss how customization techniques such as prompt engineering can help participants work around these limitations. Alongside a new publicly available dataset, we offer insights for interaction design at both query and system levels.

翻译：提示与引导技术在通用生成式人工智能中已得到广泛应用，然而面向盲人用户的辅助性视觉问答工具仍遵循僵化的交互模式，其定制化机会有限。当系统响应与用户目标及使用情境不一致时，用户控制机制尤为重要——这对依赖此类系统获取视觉信息的盲人用户而言尤为关键。本研究邀请11位盲人用户对现实场景中的对话式视觉问答系统进行交互定制。基于418次交互记录、反思报告及研究后访谈，我们分析了参与者采用的提示技术，包括研究中引入的方法及他们在实际使用中自主开发的策略。视觉问答交互通常呈现冗长特征：参与者平均进行3轮对话（最高达21轮），其输入文本长度通常仅为所听取响应的十分之一。该系统基于前沿大语言模型构建，但存在以下局限：缺乏输出简洁度控制、时空距离估算能力有限、依赖不可访问的图像构图框架，且几乎不提供相机操作引导。我们探讨了提示工程等定制化技术如何帮助用户突破这些限制。通过新发布的开源数据集，我们在查询层面和系统层面为交互设计提供了新的见解。

0

相关内容

机器视觉专题报告: AI+机器视觉，应用场景持续拓展

机器视觉专题报告: AI+机器视觉，应用场景持续拓展

专知会员服务

61+阅读 · 2023年6月20日

可解释AI《增加自主智能体透明度的用户直观解释》论文，雷神技术研究中心

可解释AI《增加自主智能体透明度的用户直观解释》论文，雷神技术研究中心

专知会员服务

47+阅读 · 2023年3月20日

《人工智能之人机交互》报告重磅发布，展示AI+人机交互的酷炫现状与未来

《人工智能之人机交互》报告重磅发布，展示AI+人机交互的酷炫现状与未来

专知会员服务

52+阅读 · 2022年4月30日

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

专知会员服务

22+阅读 · 2022年3月29日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

37+阅读 · 2022年3月25日

清华&南开最新「视觉注意力机制Attention」综述论文，带你全面了解六大类注意力机制方法

清华&南开最新「视觉注意力机制Attention」综述论文，带你全面了解六大类注意力机制方法

专知会员服务

99+阅读 · 2021年11月20日

基于视觉和语言的跨媒体问答与推理研究综述

专知会员服务

32+阅读 · 2021年3月17日

机器视觉技术研究进展及展望

专知会员服务

106+阅读 · 2020年11月27日

【目标跟踪 | 2019最新综述】视觉跟踪器的回顾及其在移动机器人中的应用分析，附25页PDF，174篇参考文献，A Review of Visual Trackers and Analysis of its Application to Mobile Robot

【目标跟踪 | 2019最新综述】视觉跟踪器的回顾及其在移动机器人中的应用分析，附25页PDF，174篇参考文献，A Review of Visual Trackers and Analysis of its Application to Mobile Robot

专知会员服务

26+阅读 · 2019年11月15日

【CVPR 2019|workshop】视觉问答和对话，Visual Question Answering and Dialog，斯坦福大学|Christopher Manning，Google DeepMind|Karl Moritz Hermann

【CVPR 2019|workshop】视觉问答和对话，Visual Question Answering and Dialog，斯坦福大学|Christopher Manning，Google DeepMind|Karl Moritz Hermann

专知会员服务

18+阅读 · 2019年6月17日

NLP+CV《桥接视觉与语言的研究综述》，带你全面了解视觉+语言最新应用和方法

NLP+CV《桥接视觉与语言的研究综述》，带你全面了解视觉+语言最新应用和方法

中国人工智能学会

27+阅读 · 2019年7月24日

前瞻研究：工业制造领域机器视觉技术应用现状及展望 | 智周报告核心版

前瞻研究：工业制造领域机器视觉技术应用现状及展望 | 智周报告核心版

机器之能

22+阅读 · 2019年6月7日

【CV+NLP】更有智慧的眼睛：图像描述（Image Caption）&视觉问答（VQA）综述（上）

【CV+NLP】更有智慧的眼睛：图像描述（Image Caption）&视觉问答（VQA）综述（上）

极市平台

79+阅读 · 2019年1月20日

【优青论文】视觉问答技术研究

【优青论文】视觉问答技术研究

计算机研究与发展

13+阅读 · 2018年9月21日

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

专知

10+阅读 · 2018年6月8日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

VizWiz数据集：用计算机视觉回答盲人的问题

VizWiz数据集：用计算机视觉回答盲人的问题

论智

10+阅读 · 2018年2月26日

多目主动相机智能监控关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

采用多模态磁共振技术研究知觉学习干预成人弱视的神经环路可塑性机制

国家自然科学基金

0+阅读 · 2015年12月31日

面向多用户行为的无线识别关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向聋儿言语康复的多模态人机交互模型及技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向交互式问答的省略恢复技术研究

国家自然科学基金

5+阅读 · 2015年12月31日

深度学习框架下基于情境线索的视觉注意研究

国家自然科学基金

2+阅读 · 2015年12月31日

机器灵巧手基于触滑觉信息协同的自适应力控制方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

移动与可穿戴计算中Eyes-Free交互界面研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于深度信息和显著计算的手势交互技术研究及应用

国家自然科学基金

1+阅读 · 2014年12月31日

基于深度学习的特征融合在移动机器人视觉中的场景理解及研究

国家自然科学基金

12+阅读 · 2014年12月31日

Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Systems

Arxiv

0+阅读 · 2月17日

How Multimodal Large Language Models Support Access to Visual Information: A Diary Study With Blind and Low Vision People

Arxiv

0+阅读 · 2月13日

Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Systems

Arxiv

0+阅读 · 2月9日

RAVEN: Realtime Accessibility in Virtual ENvironments for Blind and Low-Vision People

Arxiv

0+阅读 · 2月8日

From Vision to Decision: Neuromorphic Control for Autonomous Navigation and Tracking

Arxiv

0+阅读 · 2月5日

Beyond touch-based human-machine interface: Control your machines in natural language by utilizing large language models and OPC UA

Arxiv

0+阅读 · 2月5日

Making Videos Accessible for Blind and Low Vision Users Using a Multimodal Agent Video Player

Arxiv

0+阅读 · 2月4日

Virtual Reflections on a Dynamic 2D Eye Model Improve Spatial Reference Identification

Arxiv

0+阅读 · 1月29日

Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People

Arxiv

0+阅读 · 1月27日

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance

Arxiv

0+阅读 · 1月17日

VIP会员

文章信息

相关主题

相关VIP内容

机器视觉专题报告: AI+机器视觉，应用场景持续拓展

机器视觉专题报告: AI+机器视觉，应用场景持续拓展

专知会员服务

61+阅读 · 2023年6月20日

可解释AI《增加自主智能体透明度的用户直观解释》论文，雷神技术研究中心

可解释AI《增加自主智能体透明度的用户直观解释》论文，雷神技术研究中心

专知会员服务

47+阅读 · 2023年3月20日

《人工智能之人机交互》报告重磅发布，展示AI+人机交互的酷炫现状与未来

《人工智能之人机交互》报告重磅发布，展示AI+人机交互的酷炫现状与未来

专知会员服务

52+阅读 · 2022年4月30日

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

专知会员服务

22+阅读 · 2022年3月29日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

37+阅读 · 2022年3月25日

清华&南开最新「视觉注意力机制Attention」综述论文，带你全面了解六大类注意力机制方法

清华&南开最新「视觉注意力机制Attention」综述论文，带你全面了解六大类注意力机制方法

专知会员服务

99+阅读 · 2021年11月20日

基于视觉和语言的跨媒体问答与推理研究综述

专知会员服务

32+阅读 · 2021年3月17日

机器视觉技术研究进展及展望

专知会员服务

106+阅读 · 2020年11月27日

【目标跟踪 | 2019最新综述】视觉跟踪器的回顾及其在移动机器人中的应用分析，附25页PDF，174篇参考文献，A Review of Visual Trackers and Analysis of its Application to Mobile Robot

【目标跟踪 | 2019最新综述】视觉跟踪器的回顾及其在移动机器人中的应用分析，附25页PDF，174篇参考文献，A Review of Visual Trackers and Analysis of its Application to Mobile Robot

专知会员服务

26+阅读 · 2019年11月15日

【CVPR 2019|workshop】视觉问答和对话，Visual Question Answering and Dialog，斯坦福大学|Christopher Manning，Google DeepMind|Karl Moritz Hermann

【CVPR 2019|workshop】视觉问答和对话，Visual Question Answering and Dialog，斯坦福大学|Christopher Manning，Google DeepMind|Karl Moritz Hermann

专知会员服务

18+阅读 · 2019年6月17日

热门VIP内容

开通专知VIP会员享更多权益服务

美国防部门开始扩建金穹反导系统基础设施

《基于选择性深度神经网络分类的弹性无线通信》最新报告

《多域作战中融合网络、电子战与动能机动》

《在东欧磨砺反无人机技能》美陆军最新反无人机训练报告

相关资讯

NLP+CV《桥接视觉与语言的研究综述》，带你全面了解视觉+语言最新应用和方法

NLP+CV《桥接视觉与语言的研究综述》，带你全面了解视觉+语言最新应用和方法

中国人工智能学会

27+阅读 · 2019年7月24日

前瞻研究：工业制造领域机器视觉技术应用现状及展望 | 智周报告核心版

前瞻研究：工业制造领域机器视觉技术应用现状及展望 | 智周报告核心版

机器之能

22+阅读 · 2019年6月7日

【CV+NLP】更有智慧的眼睛：图像描述（Image Caption）&视觉问答（VQA）综述（上）

【CV+NLP】更有智慧的眼睛：图像描述（Image Caption）&视觉问答（VQA）综述（上）

极市平台

79+阅读 · 2019年1月20日

【优青论文】视觉问答技术研究

【优青论文】视觉问答技术研究

计算机研究与发展

13+阅读 · 2018年9月21日

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

专知

10+阅读 · 2018年6月8日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

VizWiz数据集：用计算机视觉回答盲人的问题

VizWiz数据集：用计算机视觉回答盲人的问题

论智

10+阅读 · 2018年2月26日

相关论文

Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Systems

Arxiv

0+阅读 · 2月17日

How Multimodal Large Language Models Support Access to Visual Information: A Diary Study With Blind and Low Vision People

Arxiv

0+阅读 · 2月13日

Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Systems

Arxiv

0+阅读 · 2月9日

RAVEN: Realtime Accessibility in Virtual ENvironments for Blind and Low-Vision People

Arxiv

0+阅读 · 2月8日

From Vision to Decision: Neuromorphic Control for Autonomous Navigation and Tracking

Arxiv

0+阅读 · 2月5日

Beyond touch-based human-machine interface: Control your machines in natural language by utilizing large language models and OPC UA

Arxiv

0+阅读 · 2月5日

Making Videos Accessible for Blind and Low Vision Users Using a Multimodal Agent Video Player

Arxiv

0+阅读 · 2月4日

Virtual Reflections on a Dynamic 2D Eye Model Improve Spatial Reference Identification

Arxiv

0+阅读 · 1月29日

Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People

Arxiv

0+阅读 · 1月27日

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance

Arxiv

0+阅读 · 1月17日

相关基金

多目主动相机智能监控关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

采用多模态磁共振技术研究知觉学习干预成人弱视的神经环路可塑性机制

国家自然科学基金

0+阅读 · 2015年12月31日

面向多用户行为的无线识别关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向聋儿言语康复的多模态人机交互模型及技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向交互式问答的省略恢复技术研究

国家自然科学基金

5+阅读 · 2015年12月31日

深度学习框架下基于情境线索的视觉注意研究

国家自然科学基金

2+阅读 · 2015年12月31日

机器灵巧手基于触滑觉信息协同的自适应力控制方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

移动与可穿戴计算中Eyes-Free交互界面研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于深度信息和显著计算的手势交互技术研究及应用

国家自然科学基金

1+阅读 · 2014年12月31日

基于深度学习的特征融合在移动机器人视觉中的场景理解及研究

国家自然科学基金

12+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员