Pre-trained language models (LMs) have, over the last few years, grown substantially in both societal adoption and training costs. This rapid growth in size has constrained progress in understanding and mitigating their biases. Since re-training LMs is prohibitively expensive, most debiasing work has focused on post-hoc or masking-based strategies, which often fail to address the underlying causes of bias. In this work, we seek to democratise pre-model debiasing research by using low-cost proxy models. Specifically, we investigate BabyLMs, compact BERT-like models trained on small and mutable corpora that can approximate bias acquisition and learning dynamics of larger models. We show that BabyLMs display closely aligned patterns of intrinsic bias formation and performance development compared to standard BERT models, despite their drastically reduced size. Furthermore, correlations between BabyLMs and BERT hold across multiple intra-model and post-model debiasing methods. Leveraging these similarities, we conduct pre-model debiasing experiments with BabyLMs, replicating prior findings and presenting new insights regarding the influence of gender imbalance and toxicity on bias formation. Our results demonstrate that BabyLMs can serve as an effective sandbox for large-scale LMs, reducing pre-training costs from over 500 GPU-hours to under 30 GPU-hours. This provides a way to democratise pre-model debiasing research and enables faster, more accessible exploration of methods for building fairer LMs.
翻译:预训练语言模型在过去几年中,其社会应用广泛性和训练成本均显著增长。模型规模的快速扩张制约了对其偏见的理解与缓解研究。由于重新训练语言模型的成本过高,现有去偏工作主要集中于后处理或掩码策略,这些方法往往无法解决偏见的根本成因。本研究旨在通过使用低成本代理模型,推动预模型去偏研究的民主化。具体而言,我们探究BabyLMs——这是一种基于小型可变语料库训练的紧凑型类BERT模型,能够近似模拟更大模型的偏见习得与学习动态。实验表明,尽管BabyLMs的规模大幅缩减,但其内在偏见形成模式与性能发展轨迹与标准BERT模型高度吻合。此外,BabyLMs与BERT之间的相关性在多种模型内去偏与后模型去偏方法中均保持稳定。基于这些相似性,我们利用BabyLMs开展预模型去偏实验,不仅复现了已有研究发现,还就性别失衡与毒性内容对偏见形成的影响提出了新见解。研究结果证明,BabyLMs可作为大规模语言模型的有效实验沙箱,将预训练成本从超过500 GPU小时降低至30 GPU小时以内。这为民主化预模型去偏研究提供了可行路径,并为构建更公平语言模型的方法探索提供了更快速、更易实现的途径。