Tutorial videos of mobile apps have become a popular and compelling way for users to learn unfamiliar app features. To make the video accessible to the users, video creators always need to annotate the actions in the video, including what actions are performed and where to tap. However, this process can be time-consuming and labor-intensive. In this paper, we introduce a lightweight approach Video2Action, to automatically generate the action scenes and predict the action locations from the video by using image-processing and deep-learning methods. The automated experiments demonstrate the good performance of Video2Action in acquiring actions from the videos, and a user study shows the usefulness of our generated action cues in assisting video creators with action annotation.
翻译:移动应用教程视频已成为用户学习陌生应用功能的一种流行且引人入胜的方式。为了使视频对用户可访问,视频创作者通常需要标注视频中的动作,包括执行了哪些动作以及点击的位置。然而,这一过程可能既耗时又费力。本文提出了一种轻量级方法Video2Action,利用图像处理和深度学习方法,自动从视频中生成动作场景并预测动作位置。自动化实验表明,Video2Action在从视频中获取动作方面具有良好的性能,同时一项用户研究证明了我们生成的动作线索在辅助视频创作者进行动作标注方面的实用性。