High-level specifications of code are inherently ambiguous, and prior systems have explored interactive techniques to help users clarify their intent and resolve such ambiguities. However, most existing approaches elicit supervision through labeled examples, which are often error-prone and may fail to capture user intent. This paper introduces a new active learning paradigm for program disambiguation based on multiple-choice queries. In this paradigm, the system presents a small set of high-level behaviors as multiple-choice options, and the user simply selects the intended one. Technically, each answer option corresponds to a Hoare triple that characterizes a cluster of semantically similar candidate programs. This formulation enables formal reasoning about the informativeness and interpretability of queries, and supports systematic construction of optimal queries. Building on this insight, we develop a new active learning algorithm and implement it in a tool called Socrates, which automatically synthesizes informative multiple-choice queries for program disambiguation. We evaluate Socrates across four domains spanning both symbolic and neurosymbolic settings and show that it produces intuitive, easy-to-answer queries and achieves efficient convergence. Most importantly, Socrates identifies the intended program more reliably than existing methods, while maintaining competitive runtime performance.
翻译:高级代码规范天然具有歧义性,现有系统已探索交互技术帮助用户明确意图并消除此类歧义。然而,大多数现有方法通过标注示例获取监督信号,这种方式易出错且可能无法捕捉用户意图。本文提出一种基于多选查询的程序歧义消除主动学习范式。在该范式中,系统将少量高层级行为呈现为多选选项,用户只需选择目标选项即可。从技术层面,每个答案选项对应一个描述语义相似候选程序聚类的霍尔三元组。这一形式化方法能够对查询的信息量与可解释性进行形式推理,并支持系统化构建最优查询。基于此洞察,我们开发了一种新型主动学习算法,并在名为Socrates的工具中实现该算法,该工具可自动为程序歧义消除合成信息丰富的多选查询。我们在涵盖符号化与神经符号化设置的四类领域上评估Socrates,表明其能生成直观易答的查询,并实现高效收敛。最重要的是,Socrates比现有方法更可靠地识别目标程序,同时保持竞争力的运行时性能。