On Gradient Boosted Decision Trees and Neural Rankers: A Case-Study on Short-Video Recommendations at ShareChat

Practitioners who wish to build real-world applications that rely on ranking models, need to decide which modelling paradigm to follow. This is not an easy choice to make, as the research literature on this topic has been shifting in recent years. In particular, whilst Gradient Boosted Decision Trees (GBDTs) have reigned supreme for more than a decade, the flexibility of neural networks has allowed them to catch up, and recent works report accuracy metrics that are on par. Nevertheless, practical systems require considerations beyond mere accuracy metrics to decide on a modelling approach. This work describes our experiences in balancing some of the trade-offs that arise, presenting a case study on a short-video recommendation application. We highlight (1) neural networks' ability to handle large training data size, user- and item-embeddings allows for more accurate models than GBDTs in this setting, and (2) because GBDTs are less reliant on specialised hardware, they can provide an equally accurate model at a lower cost. We believe these findings are of relevance to researchers in both academia and industry, and hope they can inspire practitioners who need to make similar modelling choices in the future.

翻译：对于希望构建依赖排序模型的实际应用的从业者而言，需要决定采用哪种建模范式。这一选择并非易事，因为近年来相关研究文献的趋势不断演变。具体而言，尽管梯度提升决策树（GBDT）占据主导地位已逾十年，但神经网络的灵活性使其迎头赶上，近期研究报告的准确度指标已与GBDT不相上下。然而，实际系统在决定建模方法时需要考虑的远不止准确度指标。本文描述了我们在平衡由此产生的若干权衡时的经验，并以短视频推荐应用为例进行案例研究。我们强调：（1）神经网络处理大规模训练数据以及用户嵌入和物品嵌入的能力，使其在该场景下能构建比GBDT更精准的模型；（2）由于GBDT对专用硬件的依赖程度较低，因此能以更低的成本实现同等精度的模型。我们相信这些发现对学术界和工业界的研究人员均具有参考价值，并期望能为未来面临类似建模选择的从业者提供启发。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/