Tutorial videos are a popular help source for learning feature-rich software. However, getting quick answers to questions about tutorial videos is difficult. We present an automated approach for responding to tutorial questions. By analyzing 633 questions found in 5,944 video comments, we identified different question types and observed that users frequently described parts of the video in questions. We then asked participants (N=24) to watch tutorial videos and ask questions while annotating the video with relevant visual anchors. Most visual anchors referred to UI elements and the application workspace. Based on these insights, we built AQuA, a pipeline that generates useful answers to questions with visual anchors. We demonstrate this for Fusion 360, showing that we can recognize UI elements in visual anchors and generate answers using GPT-4 augmented with that visual information and software documentation. An evaluation study (N=16) demonstrates that our approach provides better answers than baseline methods.
翻译:教程视频是用户学习功能丰富软件时常用的帮助资源。然而,针对教程视频的问题快速获得答案仍具挑战。我们提出了一种自动响应教程问题的方法。通过分析 5,944 条视频评论中的 633 个问题,我们识别出不同的问题类型,并观察到用户常在问题中描述视频的特定部分。随后,我们邀请参与者(N=24)观看教程视频,在提问时用相关视觉锚点对视频进行标注。多数视觉锚点指向 UI 元素和应用程序工作区。基于这些发现,我们构建了 AQuA,一个能借助视觉锚点生成有用答案的流水线。我们针对 Fusion 360 进行了演示,表明系统能识别视觉锚点中的 UI 元素,并利用 GPT-4 结合视觉信息与软件文档生成答案。评估研究(N=16)表明,我们的方法相比基线方法能提供更优的答案。