Users of natural language interfaces, generally powered by Large Language Models (LLMs),often must repeat their preferences each time they make a similar request. To alleviate this, we propose including some of a user's preferences and instructions in natural language -- collectively termed standing instructions -- as additional context for such interfaces. For example, when a user states I'm hungry, their previously expressed preference for Persian food will be automatically added to the LLM prompt, so as to influence the search for relevant restaurants. We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains, where each dialogue is paired with a user profile (a set of users specific standing instructions) and corresponding structured representations (API calls). A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue. NLSI contains diverse phenomena, from simple preferences to interdependent instructions such as triggering a hotel search whenever the user is booking tickets to an event. We conduct experiments on NLSI using prompting with large language models and various retrieval approaches, achieving a maximum of 44.7% exact match on API prediction. Our results demonstrate the challenges in identifying the relevant standing instructions and their interpretation into API calls.
翻译:自然语言界面(通常由大型语言模型驱动)的用户常需在每次同类请求中重复个人偏好。为解决此问题,我们提出将用户的部分偏好与指令(统称为“持续性指令”)以自然语言形式纳入界面上下文。例如,当用户表示“我饿了”时,其此前表达的波斯菜偏好将被自动添加至语言模型的提示中,从而影响相关餐厅的检索结果。我们构建了NLSI数据集——一个涵盖17个领域、包含2400余段对话的语言-程序映射数据集,其中每段对话均关联用户画像(一组用户专属持续性指令)及对应的结构化表征(API调用)。NLSI的核心挑战在于识别特定对话情境下子集持续性指令的适用性,其现象涵盖从简单偏好到嵌套指令(如用户预订活动门票时自动触发酒店搜索)的多元交互场景。我们通过大型语言模型提示与多种检索方法对NLSI进行实验,在API预测任务中最高达到44.7%的精确匹配率。实验结果揭示了当前持续性指令相关性识别及API调用转化机制面临的关键挑战。