FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information conceived in the original feature texts. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs generally tokenize the input text data into subword tokens and ignore field-wise collaborative signals. Therefore, these two lines of research focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. In this paper, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. We design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM for downstream CTR prediction tasks, thus achieving superior performance by combining the advantages of both models. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs.

翻译：点击率（CTR）预测是各类个性化在线服务的核心功能模块。传统基于ID的CTR预测模型以表格模态的独热编码ID特征为输入，通过特征交互建模捕捉协同信号。但独热编码丢弃了原始特征文本中包含的语义信息。近年来，预训练语言模型（PLMs）的出现催生了另一范式，该范式以硬提示模板生成的文本模态句子为输入，并采用PLMs提取语义知识。然而，PLMs通常将输入文本数据分词为子词单元，忽略了字段级别的协同信号。因此，这两条研究路线聚焦于同一输入数据的不同特性（即文本和表格模态），形成了显著的互补关系。本文提出面向CTR预测的ID基础模型与预训练语言模型间的细粒度特征级对齐（FLIP）。我们设计了一种新颖的联合重建预训练任务，同时作用于掩码语言建模和表格建模。具体而言，一个模态（即词元或特征）的掩码数据需借助另一模态进行恢复，通过双模态间的充分互信息提取建立特征级别的交互与对齐。此外，我们提出联合微调ID基础模型和PLM以完成下游CTR预测任务，从而结合两种模型的优势获得卓越性能。在三个真实数据集上的大量实验表明，FLIP优于最先进基线方法，并与多种ID基础模型和PLM高度兼容。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日