Pre-trained language models (LMs) have, over the last few years, grown substantially in both societal adoption and training costs. This rapid growth in size has constrained progress in understanding and mitigating their biases. Since re-training LMs is prohibitively expensive, most debiasing work has focused on post-hoc or masking-based strategies, which often fail to address the underlying causes of bias. In this work, we seek to democratise pre-model debiasing research by using low-cost proxy models. Specifically, we investigate BabyLMs, compact BERT-like models trained on small and mutable corpora that can approximate bias acquisition and learning dynamics of larger models. We show that BabyLMs display closely aligned patterns of intrinsic bias formation and performance development compared to standard BERT models, despite their drastically reduced size. Furthermore, correlations between BabyLMs and BERT hold across multiple intra-model and post-model debiasing methods. Leveraging these similarities, we conduct pre-model debiasing experiments with BabyLMs, replicating prior findings and presenting new insights regarding the influence of gender imbalance and toxicity on bias formation. Our results demonstrate that BabyLMs can serve as an effective sandbox for large-scale LMs, reducing pre-training costs from over 500 GPU-hours to under 30 GPU-hours. This provides a way to democratise pre-model debiasing research and enables faster, more accessible exploration of methods for building fairer LMs.
翻译:预训练语言模型(LMs)在过去几年中,无论在社会应用还是训练成本方面均实现了显著增长。模型规模的快速扩张制约了对其偏见的理解与缓解研究进展。由于重新训练语言模型的成本极其高昂,现有去偏工作大多集中于后处理或基于掩码的策略,这些方法往往难以触及偏见的根本成因。本研究旨在通过采用低成本代理模型,推动预模型去偏研究的民主化。具体而言,我们探究BabyLMs——这是一种基于小型可变语料库训练的紧凑型类BERT模型,能够近似模拟更大模型的偏见习得与学习动态。实验表明,尽管BabyLMs的规模大幅缩减,但其内在偏见形成模式与性能发展轨迹与标准BERT模型高度吻合。此外,BabyLMs与BERT之间的相关性在多种模型内与模型后去偏方法中均保持稳定。基于这些相似性,我们利用BabyLMs开展预模型去偏实验,不仅复现了已有研究发现,还就性别失衡与毒性内容对偏见形成的影响提出了新见解。研究结果证明,BabyLMs可作为大规模语言模型的有效实验沙盒,将预训练成本从超过500 GPU-小时降低至30 GPU-小时以内。这为民主化预模型去偏研究提供了可行路径,并为构建更公平语言模型的方法探索实现了更快速、更普惠的研究范式。