We present ToddlerBERTa, a BabyBERTa-like language model, exploring its capabilities through five different models with varied hyperparameters. Evaluating on BLiMP, SuperGLUE, MSGS, and a Supplement benchmark from the BabyLM challenge, we find that smaller models can excel in specific tasks, while larger models perform well with substantial data. Despite training on a smaller dataset, ToddlerBERTa demonstrates commendable performance, rivalling the state-of-the-art RoBERTa-base. The model showcases robust language understanding, even with single-sentence pretraining, and competes with baselines that leverage broader contextual information. Our work provides insights into hyperparameter choices, and data utilization, contributing to the advancement of language models.
翻译:我们提出了ToddlerBERTa,一种类似于BabyBERTa的语言模型,通过五种不同超参数设置的模型探索其性能。在BLiMP、SuperGLUE、MSGS及BabyLM挑战赛的补充基准测试上的评估表明,较小规模的模型能够在特定任务中表现优异,而较大规模的模型则能在充足数据下展现良好性能。尽管训练数据集较小,ToddlerBERTa仍展现出与当前最先进的RoBERTa-base相媲美的出色表现。该模型即使在单句预训练条件下也能展示出稳健的语言理解能力,并可与利用更广泛上下文信息的基线模型竞争。我们的工作为超参数选择和数据利用提供了见解,推动了语言模型的发展。