Face-to-face communication modeling in computer vision is an area of research focusing on developing algorithms that can recognize and analyze non-verbal cues and behaviors during face-to-face interactions. We propose an alternative to text chats for Human-AI interaction, based on non-verbal visual communication only, using facial expressions and head movements that mirror, but also improvise over the human user, to efficiently engage with the users, and capture their attention in a low-cost and real-time fashion. Our goal is to track and analyze facial expressions, and other non-verbal cues in real-time, and use this information to build models that can predict and understand human behavior. We offer three different complementary approaches, based on retrieval, statistical, and deep learning techniques. We provide human as well as automatic evaluations and discuss the advantages and disadvantages of each direction.
翻译:人脸交互建模是计算机视觉领域的一个研究方向,其重点在于开发能够识别与分析面对面交互过程中非语言线索及行为的算法。我们提出了一种基于纯非语言视觉交流的人机交互替代方案——该方案仅通过人类用户的面部表情和头部动作进行模仿与即兴回应,以低成本、实时的方式高效吸引用户注意力并实现互动。我们的目标是实时追踪与分析面部表情及其他非语言线索,并利用这些信息构建能够预测和理解人类行为的模型。我们提出了基于检索技术、统计方法和深度学习技术的三种互补方案,通过人工评估与自动评估相结合的方式,探讨了各方向的优劣。