Small Language Models (SLMs) are generally considered more compact versions of large language models (LLMs). This study investigates the ability of SLMs with parameters between 1 and 3 billion to learn, retain, and subsequently eliminate different types of noise present in the data. Four pre-trained SLMs were utilized for this: Olmo 1B, Qwen1.5 1.8B, Gemma 2B, and Phi2 2.7B. The models were instruction-tuned on noise-free data and tested using in-context examples to determine if they could learn noise through examples. Subsequently, noise patterns were introduced in instruction tuning to evaluate the noise learning, unlearning, and retention capabilities of the models. Olmo, the smallest model, was highly sensitive to noise, quickly adapting to noisy patterns. Phi2 resisted learning character-level and transliteration noise, likely due to its carefully curated, structured, and high-quality pretraining data. Gemma excelled with transliteration noise, likely benefiting from its multilingual pretraining. The findings can be used to develop robust training strategies for SLMs.
翻译:小规模语言模型(SLMs)通常被视为大规模语言模型(LLMs)的紧凑版本。本研究探讨了参数量在10亿至30亿之间的SLMs学习、保持并随后消除数据中不同类型噪声的能力。为此使用了四种预训练的SLMs:Olmo 1B、Qwen1.5 1.8B、Gemma 2B和Phi2 2.7B。这些模型在无噪声数据上进行了指令微调,并通过上下文示例测试其是否能够通过示例学习噪声。随后,在指令微调中引入噪声模式,以评估模型的噪声学习、遗忘和保持能力。最小的模型Olmo对噪声高度敏感,能快速适应噪声模式。Phi2在字符级和音译噪声的学习上表现出抵抗性,这很可能源于其精心策划、结构良好且高质量的预训练数据。Gemma在音译噪声处理上表现优异,可能得益于其多语言预训练背景。这些发现可用于为SLMs制定鲁棒的训练策略。