VoxML is a modeling language used to map natural language expressions into real-time visualizations using commonsense semantic knowledge of objects and events. Its utility has been demonstrated in embodied simulation environments and in agent-object interactions in situated multimodal human-agent collaboration and communication. It introduces the notion of object affordance (both Gibsonian and Telic) from HRI and robotics, as well as the concept of habitat (an object's context of use) for interactions between a rational agent and an object. This paper aims to specify VoxML as an annotation language in general abstract terms. It then shows how it works on annotating linguistic data that express visually perceptible human-object interactions. The annotation structures thus generated will be interpreted against the enriched minimal model created by VoxML as a modeling language while supporting the modeling purposes of VoxML linguistically.
翻译:VoxML是一种建模语言,通过利用对象和事件的常识语义知识,将自然语言表达映射到实时可视化中。其效用已在具身模拟环境以及情境化多模态人机协作与通信中的智能体-对象交互中得到验证。该语言引入了来自人机交互与机器人学的对象可供性概念(包括吉布森式与目的式),以及栖息地概念(即对象的上下文使用场景),用于理性智能体与对象之间的交互。本文旨在以通用抽象术语将VoxML规范为一种标注语言,进而展示其如何应用于标注表达视觉可感知的人-对象交互的语言数据。由此生成的标注结构将对照VoxML作为建模语言所构建的富化最小模型进行解释,同时在语言学层面支持VoxML的建模目标。