Adapting LLMs to low-resource languages is difficult: labeled data is scarce, full-model fine-tuning is unstable, and continued cross-lingual tuning can cause catastrophic forgetting. We propose Circuit-Targeted Supervised Fine-Tuning (CT-SFT): a counterfactual-free adaptation of CD-T (Contextual Decomposition Transformer) that uses a label-balanced mean baseline and task-directional relevance scoring to identify a sparse set of task-relevant attention heads in a proxy-language checkpoint, then transfer learns to a target language by updating only those heads (plus LayerNorm) via head-level gradient masking. Across NusaX-Senti and XNLI, CT-SFT improves cross-lingual accuracy over continued full fine-tuning while updating only a small subset of model parameters. We find an editing-preserving trade-off: harder transfers favor editing circuit heads, while easier transfers often favor near-zero (i.e., low-relevance heads) updates, preserving the source mechanism. CT-SFT also substantially reduces catastrophic forgetting, preserving proxy/source-language competence during transfer.
翻译:将大语言模型适应于低资源语言面临诸多困难:标注数据稀缺,全模型微调不稳定,持续的跨语言调优可能导致灾难性遗忘。本文提出电路导向监督微调(CT-SFT):这是一种无需反事实推理的CD-T(上下文分解Transformer)适应方法,通过使用标签平衡均值基线和任务方向相关性评分,在代理语言检查点中识别出稀疏的任务相关注意力头集合,随后通过头级梯度掩码仅更新这些注意力头(及LayerNorm)来实现向目标语言的迁移学习。在NusaX-Senti和XNLI数据集上的实验表明,与持续全模型微调相比,CT-SFT在仅更新模型参数小子集的情况下提升了跨语言准确率。我们发现了编辑-保持的权衡关系:较困难的迁移任务倾向于编辑电路头,而较容易的迁移则常倾向于接近零(即低相关性头)更新,从而保持源机制。CT-SFT还能显著减少灾难性遗忘,在迁移过程中保持代理/源语言能力。