"Get ready for a party": Exploring smarter smart spaces with help from large language models

The right response to someone who says "get ready for a party" is deeply influenced by meaning and context. For a smart home assistant (e.g., Google Home), the ideal response might be to survey the available devices in the home and change their state to create a festive atmosphere. Current practical systems cannot service such requests since they require the ability to (1) infer meaning behind an abstract statement and (2) map that inference to a concrete course of action appropriate for the context (e.g., changing the settings of specific devices). In this paper, we leverage the observation that recent task-agnostic large language models (LLMs) like GPT-3 embody a vast amount of cross-domain, sometimes unpredictable contextual knowledge that existing rule-based home assistant systems lack, which can make them powerful tools for inferring user intent and generating appropriate context-dependent responses during smart home interactions. We first explore the feasibility of a system that places an LLM at the center of command inference and action planning, showing that LLMs have the capacity to infer intent behind vague, context-dependent commands like "get ready for a party" and respond with concrete, machine-parseable instructions that can be used to control smart devices. We furthermore demonstrate a proof-of-concept implementation that puts an LLM in control of real devices, showing its ability to infer intent and change device state appropriately with no fine-tuning or task-specific training. Our work hints at the promise of LLM-driven systems for context-awareness in smart environments, motivating future research in this area.

翻译：当有人说“为派对做好准备”时，最合适的回应深受其含义和上下文背景的影响。对于智能家居助手（例如Google Home）而言，理想的回应可能是扫描家中可用设备并改变其状态，以营造节日氛围。当前的实际系统无法处理此类请求，因为这要求系统具备以下能力：（1）推断抽象表述背后的含义；（2）将这种推断映射到符合具体上下文的具体行动方案（例如改变特定设备的设置）。在本文中，我们利用近期任务无关的大语言模型（如GPT-3）所蕴含的跨领域、有时难以预测的上下文知识——这正是现有基于规则的智能家居助手系统所缺乏的——使其成为在智能家居交互中推断用户意图并生成恰当上下文依赖响应的强大工具。我们首先探索了一种将大语言模型置于命令推断和行动规划核心位置的系统的可行性，结果表明大语言模型能够推断诸如“为派对做好准备”这类模糊、依赖上下文的命令背后的意图，并给出具体的、机器可解析的指令来控制智能设备。此外，我们展示了一个概念验证实现，该实现让大语言模型控制真实设备，证明其无需微调或任务特定训练即可推断意图并恰当地改变设备状态。我们的工作揭示了基于大语言模型的系统在智能环境中的上下文感知能力方面的潜力，为该领域的未来研究提供了动力。