Despite the frequent challenges posed by ambiguity when representing meaning via natural language, it is often ignored or deliberately removed in tasks mapping language to formally-designed representations, which generally assume a one-to-one mapping between linguistic and formal representations. We attempt to address this shortcoming by introducing AmP, a framework, dataset, and challenge for translating ambiguous natural language to formal representations like logic and code. We define templates and generate data for five well-documented linguistic ambiguities. Using AmP, we investigate how several few-shot text-to-code systems handle ambiguity, introducing three new metrics. We find that large pre-trained models perform poorly at capturing the distribution of possible meanings without deliberate instruction. However, models are able to capture the distribution well when ambiguity is attested in their inputs. These results motivate a call for including ambiguity explicitly in datasets and promote considering the distribution of possible outputs when evaluating systems. Data and code: https://github.com/esteng/ambiguous_parsing
翻译:尽管通过自然语言表示意义时经常面临歧义带来的挑战,但在将语言映射到形式化设计的表示(通常假设语言与形式表示之间存在一一对应关系)的任务中,歧义常被忽略或刻意消除。我们尝试通过引入AmP——一个用于将歧义自然语言翻译为逻辑和代码等形式化表示的框架、数据集与挑战——来弥补这一不足。我们为五种文献中充分记载的语言歧义定义了模板并生成了数据。借助AmP,我们研究了几种少样本文本到代码系统如何处理歧义,并引入了三个新指标。研究发现,未经刻意指令引导的大型预训练模型在捕捉可能意义的分布方面表现不佳。然而,当输入中明确标注了歧义时,模型能够较好地捕捉该分布。这些结果呼吁在数据集中明确包含歧义,并建议在评估系统时考虑可能输出结果的分布。数据和代码:https://github.com/esteng/ambiguous_parsing