Code generation focuses on the automatic conversion of natural language (NL) utterances into code snippets. The sequence-to-tree (Seq2Tree) approaches are proposed for code generation, with the guarantee of the grammatical correctness of the generated code, which generate the subsequent Abstract Syntax Tree (AST) node relying on antecedent predictions of AST nodes. Existing Seq2Tree methods tend to treat both antecedent predictions and subsequent predictions equally. However, under the AST constraints, it is difficult for Seq2Tree models to produce the correct subsequent prediction based on incorrect antecedent predictions. Thus, antecedent predictions ought to receive more attention than subsequent predictions. To this end, in this paper, we propose an effective method, named Antecedent Prioritized (AP) Loss, that helps the model attach importance to antecedent predictions by exploiting the position information of the generated AST nodes. We design an AST-to-Vector (AST2Vec) method, that maps AST node positions to two-dimensional vectors, to model the position information of AST nodes. To evaluate the effectiveness of our proposed loss, we implement and train an Antecedent Prioritized Tree-based code generation model called APT. With better antecedent predictions and accompanying subsequent predictions, APT significantly improves the performance. We conduct extensive experiments on four benchmark datasets, and the experimental results demonstrate the superiority and generality of our proposed method.
翻译:代码生成专注于将自然语言(NL)语句自动转换为代码片段。为了确保生成代码的语法正确性,研究人员提出了序列到树(Seq2Tree)方法,这些方法在生成后续抽象语法树(AST)节点时依赖于先决的AST节点预测。现有的Seq2Tree方法倾向于平等对待先决预测和后续预测。然而,在AST约束下,Seq2Tree模型难以基于错误的先决预测生成正确的后续预测。因此,先决预测应比后续预测获得更多关注。为此,本文提出了一种名为先决优先(AP)损失的有效方法,通过利用生成AST节点的位置信息,帮助模型重视先决预测。我们设计了一种AST到向量(AST2Vec)方法,将AST节点位置映射为二维向量,以建模AST节点的位置信息。为评估所提损失的有效性,我们实现并训练了一个名为APT的先决优先基于树的代码生成模型。凭借更好的先决预测及其伴随的后续预测,APT显著提升了性能。我们在四个基准数据集上进行了广泛实验,实验结果证明了所提方法的优越性和通用性。