Deferred is Better: A Framework for Multi-Granularity Deferred Interaction of Heterogeneous Features

Click-through rate (CTR) prediction models estimates the probability of a user-item click by modeling interactions across a vast feature space. A fundamental yet often overlooked challenge is the inherent heterogeneity of these features: their sparsity and information content vary dramatically. For instance, categorical features like item IDs are extremely sparse, whereas numerical features like item price are relatively dense. Prevailing CTR models have largely ignored this heterogeneity, employing a uniform feature interaction strategy that inputs all features into the interaction layers simultaneously. This approach is suboptimal, as the premature introduction of low-information features can inject significant noise and mask the signals from information-rich features, which leads to model collapse and hinders the learning of robust representations. To address the above challenge, we propose a Multi-Granularity Information-Aware Deferred Interaction Network (MGDIN), which adaptively defers the introduction of features into the feature interaction process. MGDIN's core mechanism operates in two stages: First, it employs a multi-granularity feature grouping strategy to partition the raw features into distinct groups with more homogeneous information density in different granularities, thereby mitigating the effects of extreme individual feature sparsity and enabling the model to capture feature interactions from diverse perspectives. Second, a delayed interaction mechanism is implemented through a hierarchical masking strategy, which governs when and how each group participates by masking low-information groups in the early layers and progressively unmasking them as the network deepens. This deferred introduction allows the model to establish a robust understanding based on high-information features before gradually incorporating sparser information from other groups...

翻译：点击率（CTR）预测模型通过建模海量特征空间中的交互来估计用户对物品的点击概率。一个基础但常被忽视的挑战在于这些特征固有的异构性：它们的稀疏性和信息含量差异巨大。例如，物品ID等类别特征极其稀疏，而物品价格等数值特征则相对稠密。主流的CTR模型大多忽视了这种异构性，采用统一的特征交互策略，将所有特征同时输入交互层。这种方法并非最优，因为过早引入低信息量特征会注入显著噪声并掩盖高信息量特征的信号，从而导致模型崩溃并阻碍鲁棒表征的学习。为解决上述挑战，我们提出了一种多粒度信息感知延迟交互网络（MGDIN），该网络自适应地延迟特征在特征交互过程中的引入。MGDIN的核心机制分两个阶段运行：首先，它采用多粒度特征分组策略，将原始特征划分为具有更均匀信息密度的不同粒度组别，从而缓解极端个体特征稀疏性的影响，并使模型能够从不同视角捕捉特征交互。其次，通过分层掩码策略实现延迟交互机制，该策略通过在浅层掩码低信息量组别，并随着网络加深逐步解除掩码，来控制每个组别参与交互的时机与方式。这种延迟引入机制使得模型能够基于高信息量特征建立鲁棒理解，再逐步纳入来自其他组别的更稀疏信息...