In this report, we describe the vision, challenges, and scientific contributions of the Task Wizard team, TWIZ, in the Alexa Prize TaskBot Challenge 2022. Our vision, is to build TWIZ bot as an helpful, multimodal, knowledgeable, and engaging assistant that can guide users towards the successful completion of complex manual tasks. To achieve this, we focus our efforts on three main research questions: (1) Humanly-Shaped Conversations, by providing information in a knowledgeable way; (2) Multimodal Stimulus, making use of various modalities including voice, images, and videos; and (3) Zero-shot Conversational Flows, to improve the robustness of the interaction to unseen scenarios. TWIZ is an assistant capable of supporting a wide range of tasks, with several innovative features such as creative cooking, video navigation through voice, and the robust TWIZ-LLM, a Large Language Model trained for dialoguing about complex manual tasks. Given ratings and feedback provided by users, we observed that TWIZ bot is an effective and robust system, capable of guiding users through tasks while providing several multimodal stimuli.
翻译:本报告描述了任务向导团队TWIZ在Alexa Prize TaskBot Challenge 2022中的愿景、挑战及科学贡献。我们的愿景是构建TWIZ机器人,使其成为兼具实用性、多模态性、知识丰富性与互动性的助手,引导用户成功完成复杂的手工任务。为此,我们将研究聚焦于三个核心问题:(1)人性化对话——以知识化的方式提供信息;(2)多模态刺激——综合利用语音、图像及视频等多种模态;(3)零样本对话流——提升系统对未见场景的交互鲁棒性。TWIZ作为能够支持广泛任务的助手,具备多项创新功能,如创意烹饪、语音导航视频浏览,以及专为复杂手工任务对话训练的强健大语言模型TWIZ-LLM。根据用户评分与反馈,我们观察到TWIZ机器人是一个高效鲁棒的系统,能够在提供多种多模态刺激的同时引导用户完成任务。