HyFormer: Revisiting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction

Industrial large-scale recommendation models (LRMs) face the challenge of jointly modeling long-range user behavior sequences and heterogeneous non-sequential features under strict efficiency constraints. However, most existing architectures employ a decoupled pipeline: long sequences are first compressed with a query-token based sequence compressor like LONGER, followed by fusion with dense features through token-mixing modules like RankMixer, which thereby limits both the representation capacity and the interaction flexibility. This paper presents HyFormer, a unified hybrid transformer architecture that tightly integrates long-sequence modeling and feature interaction into a single backbone. From the perspective of sequence modeling, we revisit and redesign query tokens in LRMs, and frame the LRM modeling task as an alternating optimization process that integrates two core components: Query Decoding which expands non-sequential features into Global Tokens and performs long sequence decoding over layer-wise key-value representations of long behavioral sequences; and Query Boosting which enhances cross-query and cross-sequence heterogeneous interactions via efficient token mixing. The two complementary mechanisms are performed iteratively to refine semantic representations across layers. Extensive experiments on billion-scale industrial datasets demonstrate that HyFormer consistently outperforms strong LONGER and RankMixer baselines under comparable parameter and FLOPs budgets, while exhibiting superior scaling behavior with increasing parameters and FLOPs. Large-scale online A/B tests in high-traffic production systems further validate its effectiveness, showing significant gains over deployed state-of-the-art models. These results highlight the practicality and scalability of HyFormer as a unified modeling framework for industrial LRMs.

翻译：工业级大规模推荐模型面临在严格效率约束下联合建模长程用户行为序列与异构非序列特征的挑战。然而，现有架构大多采用解耦式流程：首先通过基于查询-令牌的序列压缩器（如LONGER）对长序列进行压缩，随后通过令牌混合模块（如RankMixer）与稠密特征融合，这限制了模型的表示能力与交互灵活性。本文提出HyFormer——一种将长序列建模与特征交互紧密集成于单一主干网络的统一混合Transformer架构。从序列建模视角出发，我们重新审视并重新设计了大规模推荐模型中的查询令牌，将推荐建模任务构建为交替优化过程，该过程整合了两个核心组件：查询解码——将非序列特征扩展为全局令牌，并在长行为序列的层级键值表示上进行长序列解码；以及查询增强——通过高效的令牌混合机制加强跨查询与跨序列的异构交互。这两种互补机制通过迭代执行以逐层优化语义表示。基于十亿级工业数据集的广泛实验表明，在可比较的参数与FLOPs预算下，HyFormer始终优于强基准模型LONGER与RankMixer，并在参数与FLOPs增加时展现出更优的扩展性能。高流量生产系统中的大规模在线A/B测试进一步验证了其有效性，相比已部署的先进模型取得了显著提升。这些结果凸显了HyFormer作为工业级大规模推荐模型统一建模框架的实用性与可扩展性。