Vision-based Interfaces (VIs) are pivotal in advancing Human-Computer Interaction (HCI), particularly in enhancing context awareness. However, there are significant opportunities for these interfaces due to rapid advancements in multimodal Artificial Intelligence (AI), which promise a future of tight coupling between humans and intelligent systems. AI-driven VIs, when integrated with other modalities, offer a robust solution for effectively capturing and interpreting user intentions and complex environmental information, thereby facilitating seamless and efficient interactions. This PhD study explores three application cases of multimodal interfaces to augment context awareness, respectively focusing on three dimensions of visual modality: scale, depth, and time: a fine-grained analysis of physical surfaces via microscopic image, precise projection of the real world using depth data, and rendering haptic feedback from video background in virtual environments.
翻译:视觉接口在推动人机交互发展、特别是在增强上下文感知方面具有关键作用。然而,随着多模态人工智能的快速发展,这些接口面临着重大机遇,预示着人类与智能系统将实现紧密耦合的未来。人工智能驱动的视觉接口与其他模态结合时,为有效捕获和解析用户意图及复杂环境信息提供了强大解决方案,从而促进无缝高效交互。本博士研究通过三个应用案例探讨多模态接口如何增强上下文感知,分别聚焦视觉模态的三个维度:尺度、深度与时间:通过显微图像实现物理表面的细粒度分析,利用深度数据对现实世界进行精确投影,以及在虚拟环境中基于视频背景渲染触觉反馈。