Click-through rate (CTR) prediction is a critical task for many applications, as its accuracy has a direct impact on user experience and platform revenue. In recent years, CTR prediction has been widely studied in both academia and industry, resulting in a wide variety of CTR prediction models. Unfortunately, there is still a lack of standardized benchmarks and uniform evaluation protocols for CTR prediction research. This leads to non-reproducible or even inconsistent experimental results among existing studies, which largely limits the practical value and potential impact of their research. In this work, we aim to perform open benchmarking for CTR prediction and present a rigorous comparison of different models in a reproducible manner. To this end, we ran over 7,000 experiments for more than 12,000 GPU hours in total to re-evaluate 24 existing models on multiple datasets and settings. Surprisingly, our experiments show that with sufficient hyper-parameter search and model tuning, many deep models have smaller differences than expected. The results also reveal that making real progress on the modeling of CTR prediction is indeed a very challenging research task. We believe that our benchmarking work could not only allow researchers to gauge the effectiveness of new models conveniently but also make them fairly compare with the state of the arts. We have publicly released the benchmarking code, evaluation protocols, and hyper-parameter settings of our work to promote reproducible research in this field.
翻译:点击率(CTR)预测是众多应用中的关键任务,其准确性直接影响用户体验和平台收益。近年来,学术界和工业界对点击率预测进行了广泛研究,产生了多种点击率预测模型。然而,目前该领域仍缺乏标准化的基准测试和统一的评估协议。这导致现有研究结果难以重现,甚至出现不一致,极大限制了研究的实际价值和潜在影响。本研究旨在对点击率预测进行开放基准测试,以可重现的方式对不同模型进行严格比较。为此,我们总计进行了超过7,000次实验,累计使用12,000余GPU小时,在多个数据集和设置下重新评估了24个现有模型。令人惊讶的是,实验结果表明,在充分的超参数搜索和模型调优下,许多深度学习模型的差异比预期更小。结果还揭示,在点击率预测建模方面取得真正进展确实是一项极具挑战性的研究任务。我们相信,本基准测试工作不仅能让研究者便捷地评估新模型的有效性,还能使其与现有最优方法进行公平比较。我们已公开发布本工作的基准测试代码、评估协议和超参数设置,以促进该领域的可重现研究。