We describe our strategy for the 2025 edition of the BabyLM Challenge. Our main contribution is that of an improved form of Masked Language Modeling (MLM), which adapts the probabilities of the tokens masked according to the model's ability to predict them. The results show a substantial increase in performance on (Super)GLUE tasks over the standard MLM. We also incorporate sub-token embeddings, finding that this increases the model's morphological generalization capabilities. Our submission beats the baseline in the strict-small track.
翻译:本文描述了我们在2025年BabyLM挑战赛中所采用的策略。我们的主要贡献在于提出了一种改进的掩码语言建模(MLM)方法,该方法根据模型预测被掩码词元的能力来调整其掩码概率。实验结果表明,相较于标准MLM,该方法在(Super)GLUE任务上的性能有显著提升。我们还引入了子词嵌入技术,发现这增强了模型的形态学泛化能力。我们的提交模型在严格小规模赛道中超越了基线水平。