We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. We can capture the output structure by pre-training a denoiser to denoise corrupted versions of unlabeled outputs. We first show that standard fine-tuning after pre-training destroys some of this structure. We then propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser, which is frozen to preserve output structure. For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative). The improvement from composed fine-tuning is magnified on out-of-distribution (OOD) examples (4% and 25% relative).
翻译:我们聚焦于存在输出有效性约束的结构化输出预测问题,例如伪代码到代码的翻译任务中,生成的代码必须能通过编译。虽然标注的输入-输出对获取成本高昂,但“无标注”输出(即无对应输入的输出)可自由获取(如GitHub上的代码),并能提供输出有效性的信息。我们通过预训练一个去噪器对无标注输出的损坏版本进行去噪,从而捕获输出结构。首先揭示标准微调会破坏预训练阶段习得的某些结构特征,进而提出复合微调方法——该方法冻结预训练去噪器以保持输出结构,仅对与冻结去噪器复合的预测器进行微调。针对双层ReLU网络,我们理论证明复合微调能显著降低预测器的复杂度,从而提升泛化性能。实验表明,在两个伪代码到代码翻译数据集上,复合微调相比标准微调分别获得3%和6%的相对性能提升;在分布外样本上,改进幅度进一步扩大至4%和25%的相对提升。