Mixed-media tutorials, which integrate videos, images, text, and diagrams to teach procedural skills, offer more browsable alternatives than timeline-based videos. However, manually creating such tutorials is tedious, and existing automated solutions are often restricted to a particular domain. While AI models hold promise, it is unclear how to effectively harness their powers, given the multi-modal data involved and the vast landscape of models. We present TutoAI, a cross-domain framework for AI-assisted mixed-media tutorial creation on physical tasks. First, we distill common tutorial components by surveying existing work; then, we present an approach to identify, assemble, and evaluate AI models for component extraction; finally, we propose guidelines for designing user interfaces (UI) that support tutorial creation based on AI-generated components. We show that TutoAI has achieved higher or similar quality compared to a baseline model in preliminary user studies.
翻译:混合媒体教程通过整合视频、图像、文本和图表来教授程序性技能,相比基于时间线的视频提供了更易浏览的替代方案。然而,手动创建此类教程较为繁琐,现有自动化解决方案通常局限于特定领域。尽管AI模型前景广阔,但鉴于涉及多模态数据及庞大的模型体系,如何有效利用其能力尚不明确。我们提出TutoAI——面向物理任务的AI辅助混合媒体教程创建跨域框架。首先,通过调研现有工作提炼通用教程组件;其次,提出识别、组装和评估AI模型进行组件提取的方法;最后,提出基于AI生成组件支持教程创建的用户界面(UI)设计指南。初步用户研究显示,TutoAI在质量上达到或优于基准模型。