Learning effective feature interactions is central to modern recommender systems, yet remains challenging in industrial settings due to sparse multi-field inputs and ultra-long user behavior sequences. While recent scaling efforts have improved model capacity, they often fail to construct both context-aware and context-independent user intent from the long-term and real-time behavior sequence. Meanwhile, recent work also suffers from inefficient and homogeneous interaction mechanisms, leading to suboptimal prediction performance. To address these limitations, we propose HeMix, a scalable ranking model that unifies adaptive sequence tokenization and heterogeneous interaction structure. Specifically, HeMix introduces a Query-Mixed Interest Extraction module that jointly models context-aware and context-independent user interests via dynamic and fixed queries over global and real-time behavior sequences. For interaction, we replace self-attention with the HeteroMixer block, enabling efficient, multi-granularity cross-feature interactions that adopt the multi-head token fusion, heterogeneous interaction and group-aligned reconstruction pipelines. HeMix demonstrates favorable scaling behavior, driven by the HeteroMixer block, where increasing model scale via parameter expansion leads to steady improvements in recommendation accuracy. Experiments on industrial-scale datasets show that HeMix scales effectively and consistently outperforms strong baselines. Most importantly, HeMix has been deployed on the AMAP platform, delivering significant online gains over DLRM: +3.61\% GMV, +2.78\% PV\_CTR, and +2.12\% UV\_CVR.
翻译:学习有效的特征交互是现代推荐系统的核心,但在工业场景中仍面临挑战,这主要源于稀疏的多域输入和超长用户行为序列。尽管近期的扩展工作提升了模型容量,但它们往往难以从长期和实时行为序列中同时构建上下文感知与上下文无关的用户意图。同时,现有方法也受限于低效且同质的交互机制,导致预测性能欠佳。为应对这些局限,我们提出HeMix,一个统一了自适应序列标记化与异构交互结构的可扩展排序模型。具体而言,HeMix引入了一个查询混合兴趣提取模块,该模块通过对全局和实时行为序列进行动态与固定查询,联合建模上下文感知和上下文无关的用户兴趣。在交互方面,我们用HeteroMixer块取代自注意力机制,该块采用多头令牌融合、异构交互和组对齐重建流程,实现了高效、多粒度的跨特征交互。HeMix展现出良好的扩展特性,这由HeteroMixer块驱动,其中通过参数扩展增加模型规模能持续提升推荐准确性。在工业规模数据集上的实验表明,HeMix能有效扩展并始终优于强基线模型。最重要的是,HeMix已在AMAP平台部署,相比DLRM取得了显著的在线收益:GMV提升3.61%,PV_CTR提升2.78%,UV_CVR提升2.12%。