The rise of code pre-trained models has significantly enhanced various coding tasks, such as code completion, and tools like GitHub Copilot. However, the substantial size of these models, especially large models, poses a significant challenge when it comes to fine-tuning them for specific downstream tasks. As an alternative approach, retrieval-based methods have emerged as a promising solution, augmenting model predictions without the need for fine-tuning. Despite their potential, a significant challenge is that the designs of these methods often rely on heuristics, leaving critical questions about what information should be stored or retrieved and how to interpolate such information for augmenting predictions. To tackle this challenge, we first perform a theoretical analysis of the fine-tuning process, highlighting the importance of delta logits as a catalyst for improving model predictions. Building on this insight, we develop a novel retrieval-based method, FT2Ra, which aims to mimic genuine fine-tuning. While FT2Ra adopts a retrieval-based mechanism, it uniquely adopts a paradigm with a learning rate and multi-epoch retrievals, which is similar to fine-tuning.In token-level completion, which represents a relatively easier task, FT2Ra achieves a 4.29% improvement in accuracy compared to the best baseline method on UniXcoder. In the more challenging line-level completion task, we observe a substantial more than twice increase in Exact Match (EM) performance, indicating the significant advantages of our theoretical analysis. Notably, even when operating without actual fine-tuning, FT2Ra exhibits competitive performance compared to the models with real fine-tuning.
翻译:代码预训练模型的兴起显著提升了代码补全等编程任务及GitHub Copilot等工具的效能。然而,这些模型(尤其是大型模型)的庞大体量使得针对特定下游任务进行微调面临重大挑战。作为替代方案,基于检索的方法应运而生,无需微调即可增强模型预测。尽管潜力巨大,但这类方法的设计常依赖启发式策略,关于应存储或检索何种信息、以及如何内插这些信息以增强预测的关键问题仍未解决。为此,我们首先对微调过程进行理论分析,强调delta logits作为改善模型预测催化剂的的重要性。基于这一见解,我们提出新型检索方法FT2Ra,旨在模拟真实微调过程。FT2Ra虽采用检索机制,却独创性地引入学习率和多轮检索范式(类似微调)。在相对简单的令牌级代码补全任务中,FT2Ra在UniXcoder上相比最优基线方法准确率提升4.29%;在更具挑战性的行级补全任务中,精确匹配(EM)性能提升超过两倍,充分彰显理论分析的显著优势。值得注意的是,即便未经历实际微调,FT2Ra仍展现出与真实微调模型相媲美的竞争力。