By combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. Traditional multimodal frameworks require laborious developer work to support rich multimodal commands where the user's multimodal command involves possibly exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework that better separates multimodal input from the computational model to enable developers to create efficient and capable multimodal interfaces with ease. ReactGenie translates multimodal user commands into NLPL (Natural Language Programming Language), a programming language we created, using a neural semantic parser based on large-language models. The ReactGenie runtime interprets the parsed NLPL and composes primitives in the computational model to implement complex user commands. As a result, ReactGenie allows easy implementation and unprecedented richness in commands for end-users of multimodal apps. Our evaluation showed that 12 developers can learn and build a nontrivial ReactGenie application in under 2.5 hours on average. In addition, compared with a traditional GUI, end-users can complete tasks faster and with less task load using ReactGenie apps.
翻译:通过结合语音和触控交互,多模态界面能够超越单一模态的交互效率。传统的多模态框架需要开发者投入大量精力来支持丰富的多模态指令,而用户的多模态指令可能涉及指数级增长的组合动作或功能调用。本文提出ReactGenie——一种编程框架,通过更好地区分多模态输入与计算模型,使开发者能够轻松创建高效且功能强大的多模态界面。ReactGenie利用基于大语言模型的神经语义解析器,将用户的多模态指令转化为我们创建的编程语言NLPL(自然语言编程语言)。ReactGenie运行时解析NLPL后,通过组合计算模型中的基本操作来实现复杂用户指令。这使得多模态应用开发者能够轻松实现指令功能,且终端用户可使用的指令丰富度达到前所未有的水平。实验表明,12名开发者平均可在2.5小时内学会并构建出具有实用价值的ReactGenie应用。此外,相较于传统图形界面,终端用户使用ReactGenie应用完成任务的速度更快,任务负荷更低。