Scaling up neural models has yielded significant advancements in a wide array of tasks, particularly in language generation. Previous studies have found that the performance of neural models frequently adheres to predictable scaling laws, correlated with factors such as training set size and model size. This insight is invaluable, especially as large-scale experiments grow increasingly resource-intensive. Yet, such scaling law has not been fully explored in dense retrieval due to the discrete nature of retrieval metrics and complex relationships between training data and model sizes in retrieval tasks. In this study, we investigate whether the performance of dense retrieval models follows the scaling law as other neural models. We propose to use contrastive log-likelihood as the evaluation metric and conduct extensive experiments with dense retrieval models implemented with different numbers of parameters and trained with different amounts of annotated data. Results indicate that, under our settings, the performance of dense retrieval models follows a precise power-law scaling related to the model size and the number of annotations. Additionally, we examine scaling with prevalent data augmentation methods to assess the impact of annotation quality, and apply the scaling law to find the best resource allocation strategy under a budget constraint. We believe that these insights will significantly contribute to understanding the scaling effect of dense retrieval models and offer meaningful guidance for future research endeavors.
翻译:神经模型的规模化已在众多任务中取得了显著进展,尤其在语言生成领域。先前研究发现,神经模型的性能通常遵循可预测的缩放定律,这些定律与训练集规模、模型大小等因素相关。这一认识极具价值,尤其在大规模实验日益消耗资源的背景下。然而,由于检索指标的离散性以及检索任务中训练数据与模型规模之间关系的复杂性,此类缩放定律在稠密检索中尚未得到充分探索。本研究旨在探究稠密检索模型的性能是否如其他神经模型一样遵循缩放定律。我们提出使用对比对数似然作为评估指标,并对采用不同参数量构建、使用不同标注数据量训练的稠密检索模型进行了大量实验。结果表明,在我们的实验设定下,稠密检索模型的性能遵循与模型规模和标注数据量相关的精确幂律缩放关系。此外,我们通过主流数据增强方法检验了缩放行为以评估标注质量的影响,并应用缩放定律在预算约束下寻找最优资源分配策略。我们相信这些发现将极大促进对稠密检索模型缩放效应的理解,并为未来研究工作提供重要指导。