Users of natural language interfaces, generally powered by Large Language Models (LLMs),often must repeat their preferences each time they make a similar request. We describe an approach to LLM-based dialogue modeling in which persistent user constraints and preferences -- collectively termed standing instructions -- as additional context for such interfaces. For example, when a user states "I'm hungry", a previously expressed preference for Persian food can be automatically added to the LLM prompt, influencing the search for relevant restaurants. We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains, where each dialogue is paired with a user profile (a set of users specific standing instructions) and corresponding structured representations (API calls). A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue. NLSI contains diverse phenomena, from simple preferences to interdependent instructions such as triggering a hotel search whenever the user is booking tickets to an event. We conduct experiments on NLSI using prompting with large language models and various retrieval approaches, achieving a maximum of 44.7% exact match on API prediction. Our results demonstrate the challenges in identifying the relevant standing instructions and their interpretation into API calls.
翻译:自然语言界面(通常由大型语言模型驱动)的用户在每次提出类似请求时,往往需要重复表述其偏好。我们提出一种基于大型语言模型的对话建模方法,该方法将持久的用户约束与偏好(统称为"持续指令")作为界面的附加上下文。例如,当用户说"我饿了"时,此前用户表达的波斯菜偏好可自动添加至相应大型语言模型的提示中,从而影响相关餐厅的搜索结果。我们构建了NLSI数据集——一个涵盖17个领域、包含2400余次对话的语言到程序数据集,其中每次对话均配备用户画像(特定的持续指令集)及对应的结构化表示(应用程序接口调用)。NLSI的关键挑战在于识别当前对话中适用的持续指令子集。该数据集包含从简单偏好到相互依赖型指令(例如当用户预订活动门票时自动触发酒店搜索)等多样化现象。我们基于大型语言模型提示技术及多种检索方法对NLSI进行实验,在应用程序接口预测任务上最高达到44.7%的精确匹配率。实验结果表明,识别相关持续指令并将其解析为应用程序接口调用仍面临显著挑战。