The goal of the BabyLM is to stimulate new research connections between cognitive modeling and language model pretraining. We invite contributions in this vein to the BabyLM Workshop, which will also include the 4th iteration of the BabyLM Challenge. As in previous years, the challenge features two ``standard'' tracks (Strict and Strict-Small), in which participants must train language models on under 100M or 10M words of data, respectively. This year, we move beyond our previous English-only pretraining datasets with a new Multilingual track, focusing on English, Dutch, and Chinese. For the workshop, we call for papers related to the overall theme of BabyLM, which includes training efficiency, small-scale training datasets, cognitive modeling, model evaluation, and architecture innovation.
翻译:BabyLM的目标是促进认知建模与语言模型预训练之间的新研究关联。我们诚邀相关研究投稿至BabyLM研讨会,该会议将同时举办第四届BabyLM挑战赛。与往年相同,本届挑战赛设有两个“标准”赛道(严格赛道与严格精简赛道),参赛者需分别使用不超过1亿词或1000万词的数据训练语言模型。今年我们突破以往仅限英语的预训练数据集,新增多语言赛道,重点关注英语、荷兰语和中文。针对研讨会,我们征集与BabyLM核心主题相关的论文,包括训练效率、小规模训练数据集、认知建模、模型评估及架构创新。