Click-Through Rate (CTR) prediction is the most critical task in product and content recommendation, and learning effective feature interaction is the key challenge to exploiting user preferences for products. Some recent research works focus on investigating more sophisticated feature interactions based on soft attention or gate mechanism, while some redundant or contradictory feature combinations are still introduced. According to Global Workspace Theory in conscious processing, human clicks on advertisements ``consciously'': only a specific subset of product features are considered, and the rest are not involved in conscious processing. Therefore, we propose a CTR model that \textbf{D}irectly \textbf{E}nhances the embeddings and \textbf{L}everages \textbf{T}runcated Conscious \textbf{A}ttention during feature interaction, termed DELTA, which contains two key components: (I) conscious truncation module (CTM), which utilizes curriculum learning to apply adaptive truncation on attention weights to select the most critical feature combinations; (II) direct embedding enhancement module (DEM), which directly and independently propagates gradient from the loss layer to the embedding layer to enhance the crucial embeddings via linear feature crossing without introducing any extra cost during inference. Extensive experiments on five challenging CTR datasets demonstrate that DELTA achieves cutting-edge performance among current state-of-the-art CTR methods.
翻译:点击率(CTR)预测是商品与内容推荐中最关键的任务,而学习有效的特征交互是利用用户对商品偏好的核心挑战。近期一些研究工作侧重于基于软注意力或门控机制的复杂特征交互建模,但仍会引入冗余或矛盾的特征组合。根据意识处理中的全局工作空间理论,用户对广告的点击过程具有"意识性":只有特定子集的产品特征被纳入考量,其余特征不参与意识处理。为此,我们提出名为DELTA的CTR模型,该模型通过\textbf{D}irectly \textbf{E}nhance(直接增强)嵌入表示并\textbf{L}everage(利用)\textbf{T}runcated Conscious \textbf{A}ttention(截断性有意注意)机制进行特征交互,包含两个核心模块:(I)意识截断模块(CTM),利用课程学习对注意力权重进行自适应截断,从而筛选出最关键的特征组合;(II)直接嵌入增强模块(DEM),通过线性特征交叉将梯度从损失层直接、独立地传播至嵌入层,在不引入推理阶段额外开销的条件下增强关键嵌入表示。在五个具有挑战性的CTR数据集上的大量实验表明,DELTA在现有最优CTR方法中达到了前沿性能水平。