In executable task-oriented semantic parsing, the system aims to translate users' utterances in natural language to machine-interpretable programs (API calls) that can be executed according to pre-defined API specifications. With the popularity of Large Language Models (LLMs), in-context learning offers a strong baseline for such scenarios, especially in data-limited regimes. However, LLMs are known to hallucinate and therefore pose a formidable challenge in constraining generated content. Thus, it remains uncertain if LLMs can effectively perform task-oriented utterance-to-API generation where respecting API's structural and task-specific constraints is crucial. In this work, we seek to measure, analyze and mitigate such constraints violations. First, we identify the categories of various constraints in obtaining API-semantics from task-oriented utterances, and define fine-grained metrics that complement traditional ones. Second, we leverage these metrics to conduct a detailed error analysis of constraints violations seen in state-of-the-art LLMs, which motivates us to investigate two mitigation strategies: Semantic-Retrieval of Demonstrations (SRD) and API-aware Constrained Decoding (API-CD). Our experiments show that these strategies are effective at reducing constraints violations and improving the quality of the generated API calls, but require careful consideration given their implementation complexity and latency.
翻译:在可执行任务导向的语义解析中,系统旨在将用户自然语言话语翻译为可依据预定义API规范执行机器可解释程序(API调用)。随着大型语言模型(LLMs)的普及,上下文学习为这类场景提供了强大的基线方法,尤其在数据受限场景中表现突出。然而,LLMs存在生成幻觉的已知问题,因而对生成内容的约束构成了严峻挑战。当遵守API的结构化约束和任务特定约束至关重要时,LLMs能否有效执行任务导向的话语到API生成仍存疑问。本研究致力于度量、分析并缓解此类约束违反。首先,我们识别了从任务导向话语获取API语义过程中的各类约束,并定义了补充传统指标的细粒度度量标准。其次,利用这些度量对最先进LLMs中出现的约束违反进行详细错误分析,从而启发我们探究两种缓解策略:演示语义检索(SRD)和API感知约束解码(API-CD)。实验表明,这些策略能有效减少约束违反并提升生成API调用的质量,但需根据其实现复杂度与延迟进行审慎权衡。