This work investigates whether small-scale LMs can benefit from instruction tuning. We compare conversational and question-answering instruction tuning datasets, applied either in a merged or sequential curriculum, using decoder-only models with 100M and 140M parameters. Evaluation spans both fine-tuning (SuperGLUE) and zero-shot (BLiMP, EWoK, WUGs, entity tracking, and psycholinguistic correlation) settings. Results show that instruction tuning yields small but consistent gains in fine-tuning scenarios, with sequential curricula outperforming merged data; however, improvements do not consistently transfer to zero-shot tasks, suggesting a trade-off between interaction-focused adaptation and broad linguistic generalization. These results highlight both the potential and the constraints of adapting human-inspired learning strategies to low-resource LMs, and point toward hybrid, curriculum-based approaches for enhancing generalization under ecological training limits.
翻译:本研究探讨小规模语言模型能否从指令微调中受益。我们比较了对话与问答指令微调数据集,采用合并或顺序课程两种方式,应用于参数量为1亿和1.4亿的仅解码器模型。评估涵盖微调(SuperGLUE)和零样本(BLiMP、EWoK、WUGs、实体追踪及心理语言学相关性)两种设置。结果表明,指令微调在微调场景中带来小幅但稳定的性能提升,其中顺序课程策略优于合并数据;然而,这些改进并未一致迁移至零样本任务,暗示了交互导向的适应与广泛语言泛化能力之间的权衡。这些发现既揭示了将人类启发的学习策略应用于低资源语言模型的潜力,也指出了其局限性,并为在生态化训练限制下通过混合课程方法增强泛化能力指明了方向。