Deep models have driven significant advances in click-through rate (CTR) prediction. While vertical scaling via layer stacking improves model expressiveness, the layer-by-layer sequential computation poses challenges to efficient scaling. Conversely, horizontal scaling through Mixture of Experts (MoE) achieves efficient scaling by activating a small subset of experts in parallel, but flat MoE layers may struggle to capture the hierarchical structure inherent in recommendation tasks. To push the Return-On-Investment (ROI) boundary, we explore the complementary strengths of both directions and propose HiLoMoE, a hierarchical LoRA MoE framework that enables holistic scaling in a parameter-efficient manner. Specifically, HiLoMoE employs lightweight rank-1 experts for parameter-efficient horizontal scaling, and stacks multiple MoE layers with hierarchical routing to enable combinatorially diverse expert compositions. Unlike conventional stacking, HiLoMoE routes based on prior layer scores rather than outputs, allowing all layers to execute in parallel. A principled three-stage training framework ensures stable optimization and expert diversity. Experiments on four public datasets show that HiLoMoE achieving better performance-efficiency tradeoff, achieving an average AUC improvement of 0.20\% in AUC and 18.5\% reduction in FLOPs compared to the non-MoE baseline.
翻译:深度模型在点击率预测领域取得了显著进展。虽然通过层级堆叠实现的纵向扩展提升了模型表达能力,但逐层顺序计算对高效扩展提出了挑战。相反,通过专家混合模型实现的横向扩展通过并行激活少量专家实现了高效扩展,但扁平化的MoE层可能难以捕捉推荐任务中固有的层次结构。为突破投资回报率的边界,我们探索了两种方向的互补优势,提出了HiLoMoE——一种分层LoRA MoE框架,能够以参数高效的方式实现整体扩展。具体而言,HiLoMoE采用轻量级秩-1专家实现参数高效的横向扩展,并通过堆叠具有分层路由机制的多层MoE实现组合多样化的专家组合。与传统堆叠方式不同,HiLoMoE基于前层评分而非输出进行路由决策,使得所有层能够并行执行。我们设计的三阶段训练框架确保了优化的稳定性与专家多样性。在四个公开数据集上的实验表明,HiLoMoE实现了更优的性能-效率权衡,与非MoE基线相比,平均AUC提升0.20%,FLOPs降低18.5%。