Multimodal scene search of conversations is essential for unlocking valuable insights into social dynamics and enhancing our communication. While experts in conversational analysis have their own knowledge and skills to find key scenes, a lack of comprehensive, user-friendly tools that streamline the processing of diverse multimodal queries impedes efficiency and objectivity. To solve it, we developed Providence, a visual-programming-based tool based on design considerations derived from a formative study with experts. It enables experts to combine various machine learning algorithms to capture human behavioral cues without writing code. Our study showed its preferable usability and satisfactory output with less cognitive load imposed in accomplishing scene search tasks of conversations, verifying the importance of its customizability and transparency. Furthermore, through the in-the-wild trial, we confirmed the objectivity and reusability of the tool transform experts' workflow, suggesting the advantage of expert-AI teaming in a highly human-contextual domain.
翻译:对话的多模态场景搜索对于揭示社会动态的深层洞见、提升沟通效率至关重要。尽管对话分析专家具备识别关键场景的专业知识和技能,但缺乏整合多种多模态查询处理的全面易用型工具,严重影响了分析效率与客观性。为此,我们基于形成性研究中专家需求提炼的设计原则,开发了可视化编程工具Providence。该工具使专家能够无需编写代码即可组合多种机器学习算法来捕捉人类行为线索。实验表明,该工具在完成对话场景搜索任务时具有更优的可用性和满意的输出质量,同时显著降低了认知负荷,验证了其可定制性与透明性的重要价值。通过真实环境试验,我们进一步证实该工具能够提升专家工作流程的客观性与可复用性,揭示了在人本情境高度复杂的领域中人机协作的优势。