ALT: Towards Fine-grained Alignment between Language and CTR Models for Click-Through Rate Prediction

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. According to the data modality and input format, the models for CTR prediction can be mainly classified into two categories. The first one is the traditional CTR models that take as inputs the one-hot encoded ID features of tabular modality, which aims to capture the collaborative signals via feature interaction modeling. The second category takes as inputs the sentences of textual modality obtained by hard prompt templates, where pretrained language models (PLMs) are adopted to extract the semantic knowledge. These two lines of research generally focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. Therefore, in this paper, we propose to conduct fine-grained feature-level Alignment between Language and CTR models (ALT) for CTR prediction. Apart from the common CLIP-like instance-level contrastive learning, we further design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose three different finetuning strategies with the option to train the aligned language and CTR models separately or jointly for downstream CTR prediction tasks, thus accommodating the varying efficacy and efficiency requirements for industrial applications. Extensive experiments on three real-world datasets demonstrate that ALT outperforms SOTA baselines, and is highly compatible for various language and CTR models.

翻译：点击率（CTR）预测是各类个性化在线服务中的核心功能模块。根据数据模态和输入格式，现有CTR预测模型主要分为两类：第一类是传统CTR模型，以表格模态的独热编码ID特征为输入，旨在通过特征交互建模捕捉协同信号；第二类模型则采用硬提示模板生成的文本模态句子作为输入，借助预训练语言模型（PLM）提取语义知识。这两类研究方向分别聚焦于同一输入数据（即文本模态与表格模态）的不同特性，形成了显著的互补关系。为此，本文提出在CTR预测任务中实现语言模型与CTR模型的细粒度特征级对齐（ALT）。除常规的CLIP式实例级对比学习外，我们进一步设计了一种面向掩码语言建模与表格建模的联合重建预训练任务：具体而言，单一模态（即词元或特征）的掩码数据需借助另一模态信息进行重构恢复，从而在跨模态充分互信息抽取的基础上建立特征级交互与对齐。此外，我们提出三种不同的微调策略，允许在下游CTR预测任务中分别或联合训练对齐后的语言模型与CTR模型，以适应工业应用场景对效能与效率的不同需求。在三个真实数据集上的大量实验表明，ALT方法不仅超越现有最优基线，且对各类语言模型和CTR模型具有高度兼容性。