Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information conceived in the original feature texts. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs generally tokenize the input text data into subword tokens and ignore field-wise collaborative signals. Therefore, these two lines of research focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. In this paper, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. We design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM for downstream CTR prediction tasks, thus achieving superior performance by combining the advantages of both models. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs.
翻译:点击率(CTR)预测是各类个性化在线服务的核心功能模块。传统基于ID的CTR预测模型以表格模态的独热编码ID特征为输入,通过特征交互建模捕捉协同信号。但独热编码丢弃了原始特征文本中包含的语义信息。近年来,预训练语言模型(PLMs)的出现催生了另一范式,该范式以硬提示模板生成的文本模态句子为输入,并采用PLMs提取语义知识。然而,PLMs通常将输入文本数据分词为子词单元,忽略了字段级别的协同信号。因此,这两条研究路线聚焦于同一输入数据的不同特性(即文本和表格模态),形成了显著的互补关系。本文提出面向CTR预测的ID基础模型与预训练语言模型间的细粒度特征级对齐(FLIP)。我们设计了一种新颖的联合重建预训练任务,同时作用于掩码语言建模和表格建模。具体而言,一个模态(即词元或特征)的掩码数据需借助另一模态进行恢复,通过双模态间的充分互信息提取建立特征级别的交互与对齐。此外,我们提出联合微调ID基础模型和PLM以完成下游CTR预测任务,从而结合两种模型的优势获得卓越性能。在三个真实数据集上的大量实验表明,FLIP优于最先进基线方法,并与多种ID基础模型和PLM高度兼容。