In this paper, we describe the University of Lyon 2 submission to the Strict-Small track of the BabyLM competition. The shared task is created with an emphasis on small-scale language modelling from scratch on limited-size data and human language acquisition. Dataset released for the Strict-Small track has 10M words, which is comparable to children's vocabulary size. We approach the task with an architecture search, minimizing masked language modelling loss on the data of the shared task. Having found an optimal configuration, we introduce two small-size language models (LMs) that were submitted for evaluation, a 4-layer encoder with 8 attention heads and a 6-layer decoder model with 12 heads which we term Bebeshka and Zlata, respectively. Despite being half the scale of the baseline LMs, our proposed models achieve comparable performance. We further explore the applicability of small-scale language models in tasks involving moral judgment, aligning their predictions with human values. These findings highlight the potential of compact LMs in addressing practical language understanding tasks.
翻译:本文介绍了里昂第二大学在BabyLM竞赛严格小规模赛道(Strict-Small track)的提交方案。该共享任务旨在强调基于有限规模数据和人类语言习得机制的小型语言建模。严格小规模赛道公布的语料库包含1000万词,与儿童词汇量规模相当。我们通过架构搜索方法优化任务数据上的掩码语言建模损失。在确定最优配置后,我们提出了两个提交评估的小型语言模型:4层编码器(含8个注意力头)和6层解码器模型(含12个注意力头),分别命名为Bebeshka与Zlata。尽管规模仅为基线语言模型的一半,所提模型仍实现了可比的性能表现。我们进一步探索了小型语言模型在道德判断任务中的适用性,使其预测结果与人类价值观保持一致。这些发现揭示了紧凑型语言模型在实际语言理解任务中的潜力。