Learning general-purpose user representations based on user behavioral logs is an increasingly popular user modeling approach. It benefits from easily available, privacy-friendly yet expressive data, and does not require extensive re-tuning of the upstream user model for different downstream tasks. While this approach has shown promise in search engines and e-commerce applications, its fit for instant messaging platforms, a cornerstone of modern digital communication, remains largely uncharted. We explore this research gap using Snapchat data as a case study. Specifically, we implement a Transformer-based user model with customized training objectives and show that the model can produce high-quality user representations across a broad range of evaluation tasks, among which we introduce three new downstream tasks that concern pivotal topics in user research: user safety, engagement and churn. We also tackle the challenge of efficient extrapolation of long sequences at inference time, by applying a novel positional encoding method.
翻译:基于用户行为日志学习通用用户表示正日益成为流行的用户建模方法。该方法受益于易于获取、保护隐私且富有表现力的数据,且无需针对不同下游任务对上游用户模型进行大量重新调优。尽管该方法在搜索引擎和电子商务应用中已展现出潜力,但其对即时通讯平台(现代数字通信的基石)的适用性仍基本处于未知领域。我们以Snapchat数据为案例探索这一研究空白。具体而言,我们实现了一个基于Transformer的用户模型,采用定制化训练目标,并证明该模型能在广泛的评估任务中生成高质量的用户表示。我们为此引入了三个关注用户研究中关键议题的新下游任务:用户安全、参与度与流失率。同时,我们通过应用新颖的位置编码方法,解决了推理阶段长序列高效外推的挑战。