Deploying language models often requires navigating accuracy vs. performance trade-offs to meet latency constraints while preserving utility. Traditional model distillation reduces size but incurs substantial costs through training separate models. We introduce ModularStarEncoder (MoSE), a 1-billion-parameter multi-exit encoder for code retrieval and classification that employs a novel Self-Distillation mechanism. This approach significantly enhances lower-layer representations, enabling flexible deployment of different model portions with favorable performance trade-offs. Our architecture improves text-to-code and code-to-code search by targeting specific encoder layers as exit heads, where higher layers guide earlier ones during training, thereby improving intermediate representations at minimal additional cost. We further enhance MoSE with a repository-level contextual loss that maximizes training context window utilization. Additionally, we release a new dataset created through code translation that extends text-to-code benchmarks with cross-language code-to-code pairs. Evaluations demonstrate the effectiveness of Self-Distillation as a principled approach to trading inference cost for accuracy across various code understanding tasks.
翻译:部署语言模型时,常需在精度与性能之间权衡,以满足延迟约束并保持实用性。传统模型蒸馏方法通过训练独立模型来缩减规模,但会产生显著成本。我们提出ModularStarEncoder(MoSE),这是一个用于代码检索与分类的10亿参数多出口编码器,采用新颖的自蒸馏机制。该方法显著增强了低层表征能力,使得可灵活部署不同模型部分并获得优越的性能权衡效果。我们的架构通过将特定编码器层设定为出口头来改进文本到代码与代码到代码的搜索——训练中高层引导低层,从而以极低额外成本提升中间层表示质量。我们进一步引入仓库级上下文损失函数来最大化训练上下文窗口的利用率,以此增强MoSE性能。此外,我们发布了通过代码翻译构建的新数据集,该数据集将文本到代码基准扩展为包含跨语言代码对的形式。实验评估表明,自蒸馏作为一种原则性方法,能在多种代码理解任务中有效实现推理成本与精度的权衡。