Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in generating these representations compared to traditional unimodal language models, which are trained on downstream tasks. Despite these advancements, existing fine-tuned neural semantic parsers are susceptible to adversarial attacks on natural-language inputs. While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios, as it requires both substantial computational resources and expensive human annotation on in-domain semantic parsing data. This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex. Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples. To address this challenge, we propose methods for improving robustness without the need for significant amounts of labeled data or heavy computational resources.
翻译:语义解析是一种旨在构建自然语言问题含义的结构化表示的技术。近年来,在代码上训练的少样本语言模型在生成这些表示方面展现出优于传统单模态语言模型(这些模型在下游任务上训练)的性能。尽管取得了这些进展,现有经过微调的神经语义解析器容易受到针对自然语言输入的对抗性攻击。虽然已有研究表明,通过对抗训练可以增强较小语义解析器的鲁棒性,但在实际场景中,这种方法对大型语言模型而言并不可行,因为它既需要大量的计算资源,又需要昂贵的领域内语义解析数据的人工标注。本文首次对基于提示的代码大型语言模型Codex的对抗鲁棒性进行了实证研究。我们的结果表明,最先进的代码语言模型容易受到精心设计的对抗样本的攻击。为解决这一挑战,我们提出了无需大量标注数据或大量计算资源即可提升鲁棒性的方法。