Adapting models to a language that was only partially present in the pre-training data requires fine-tuning, which is expensive in terms of both data and computational resources. As an alternative to fine-tuning, we explore the potential of activation steering-based techniques to enhance model performance on Italian tasks. Through our experiments we show that Italian steering (i) can be successfully applied to different models, (ii) achieves performances comparable to, or even better than, fine-tuned models for Italian, and (iii) yields higher quality and consistency in Italian generations. We also discuss the utility of steering and fine-tuning in the contemporary LLM landscape where models are anyway getting high Italian performances even if not explicitly trained in this language.
翻译:使模型适应预训练数据中仅部分存在的语言通常需要微调,这在数据和计算资源方面均成本高昂。作为微调的替代方案,我们探索了基于激活引导的技术在提升模型意大利语任务性能方面的潜力。实验表明,意大利语引导技术(i)可成功应用于不同模型,(ii)在意大利语任务上达到与微调模型相当甚至更优的性能,且(iii)能生成更高质量、更一致的意大利语文本。我们还探讨了在当前大语言模型背景下引导与微调技术的实用性——尽管未接受该语言的显式训练,现有模型已能实现较高的意大利语性能。