We present VOICE, a novel approach for connecting large language models' (LLM) conversational capabilities with interactive exploratory visualization. VOICE introduces several innovative technical contributions that drive our conversational visualization framework. Our foundation is a pack-of-bots that can perform specific tasks, such as assigning tasks, extracting instructions, and generating coherent content. We employ fine-tuning and prompt engineering techniques to tailor bots' performance to their specific roles and accurately respond to user queries, and a new prompt-based iterative scene-tree generation establishes a coupling with a structural model. Our text-to-visualization method generates a flythrough sequence matching the content explanation. Finally, 3D natural language interaction provides capabilities to navigate and manipulate the 3D models in real-time. The VOICE framework can receive arbitrary voice commands from the user and responds verbally, tightly coupled with corresponding visual representation with low latency and high accuracy. We demonstrate the effectiveness and high generalizability potential of our approach by applying it to two distinct domains: analyzing three 3D molecular models with multi-scale and multi-instance attributes, and showcasing its effectiveness on a cartographic map visualization. A free copy of this paper and all supplemental materials are available at https://osf.io/g7fbr/.
翻译:我们提出了VOICE,一种将大型语言模型的对话能力与交互式探索性可视化相连接的新方法。VOICE引入了若干创新技术贡献,以驱动我们的对话式可视化框架。其基础是一组能够执行特定任务(如任务分配、指令提取及连贯内容生成)的智能体包。我们采用微调和提示工程技术来定制智能体在各自角色中的表现,并准确响应用户查询;同时,一种基于提示的迭代场景树生成方法建立了与结构模型的耦合。我们的文本转可视化方法可生成与内容解释相匹配的飞越序列。最后,三维自然语言交互提供了实时导航与操控三维模型的能力。VOICE框架能接收用户任意语音指令并以语音回应,同时与对应的视觉表示实现低延迟、高精度的紧密耦合。通过将其应用于两个不同领域——分析具有多尺度与多实例属性的三维分子模型,以及展示其在制图地图可视化中的有效性——我们验证了该方法的高效性与广泛泛化潜力。本文及所有补充材料的免费副本可从https://osf.io/g7fbr/获取。