Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in generating these representations compared to traditional unimodal language models, which are trained on downstream tasks. Despite these advancements, existing fine-tuned neural semantic parsers are susceptible to adversarial attacks on natural-language inputs. While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios, as it requires both substantial computational resources and expensive human annotation on in-domain semantic parsing data. This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex. Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples. To address this challenge, we propose methods for improving robustness without the need for significant amounts of labeled data or heavy computational resources.
翻译:语义解析是一种旨在构建自然语言问题意义的结构化表示的技术。近年来,基于代码训练的少样本语言模型在生成这些表示方面展现出比传统单模态语言模型更优越的性能,后者通常在下游任务上进行训练。尽管取得了这些进展,现有的微调神经语义解析器仍容易受到针对自然语言输入的对抗性攻击。虽然已有研究表明,通过对抗性训练可以增强较小语义解析器的鲁棒性,但在实际场景中,这种方法不适用于大型语言模型,因为这需要大量的计算资源和昂贵的领域内语义解析数据的人工标注。本文首次对基于提示的大型代码语言模型Codex的对抗鲁棒性进行了实证研究。我们的结果表明,最先进的代码语言模型容易受到精心设计的对抗样本的攻击。为应对这一挑战,我们提出了无需大量标注数据或昂贵计算资源即可提升鲁棒性的方法。