Ranking models, i.e., coarse-ranking and fine-ranking models, serve as core components in large-scale recommendation systems, responsible for scoring massive item candidates based on user preferences. To meet the stringent latency requirements of online serving, structural lightweighting or knowledge distillation techniques are commonly employed for ranking model acceleration. However, these approaches typically lead to a non-negligible drop in accuracy. Notably, the angle of lossless acceleration by optimizing feature fusion matrix multiplication, particularly through structural reparameterization, remains underexplored. In this paper, we propose MaRI, a novel Matrix Re-parameterized Inference framework, which serves as a complementary approach to existing techniques while accelerating ranking model inference without any accuracy loss. MaRI is motivated by the observation that user-side computation is redundant in feature fusion matrix multiplication, and we therefore adopt the philosophy of structural reparameterization to alleviate such redundancy.
翻译:排序模型(包括粗排模型与精排模型)作为大规模推荐系统的核心组件,负责根据用户偏好对海量候选项目进行打分。为满足在线服务的严格延迟要求,通常采用结构轻量化或知识蒸馏技术来加速排序模型。然而,这些方法通常会导致不可忽视的准确率下降。值得注意的是,通过优化特征融合矩阵乘法(特别是借助结构重参数化)实现无损加速的研究视角仍未得到充分探索。本文提出MaRI——一种新颖的矩阵重参数化推理框架,该框架作为现有技术的补充方案,可在不损失任何准确率的前提下加速排序模型推理。MaRI的设计动机源于对特征融合矩阵乘法中用户侧计算冗余性的观察,我们因此采用结构重参数化的思想来缓解此类冗余。