User Modeling plays an essential role in industry. In this field, task-agnostic approaches, which generate general-purpose representation applicable to diverse downstream user cognition tasks, is a promising direction being more valuable and economical than task-specific representation learning. With the rapid development of Internet service platforms, user behaviors have been accumulated continuously. However, existing general-purpose user representation researches have little ability for full-life cycle modeling on extremely long behavior sequences since user registration. In this study, we propose a novel framework called full- Life cycle User Representation Model (LURM) to tackle this challenge. Specifically, LURM consists of two cascaded sub-models: (I) Bag-of-Interests (BoI) encodes user behaviors in any time period into a sparse vector with super-high dimension (e.g., 10^5); (II) Self-supervised Multi-anchor Encoder Network (SMEN) maps sequences of BoI features to multiple low-dimensional user representations. Specially, SMEN achieves almost lossless dimensionality reduction, benefiting from a novel multi-anchor module which can learn different aspects of user interests. Experiments on several benchmark datasets show that our approach outperforms state-of-the-art general-purpose representation methods.
翻译:用户建模在工业界中扮演着至关重要的角色。在该领域中,任务无关的方法能够生成适用于多种下游用户认知任务的通用表征,相比任务特定的表征学习,这一方向更具价值且更经济。随着互联网服务平台的高速发展,用户行为数据得以持续积累。然而,现有的通用用户表征研究对用户注册以来极端长行为序列的全生命周期建模能力不足。本研究提出一种名为全生命周期用户表征模型(LURM)的新型框架以应对此挑战。具体而言,LURM由两个级联子模型构成:(1)兴趣袋模块(BoI)将任意时间段内的用户行为编码为超高维稀疏向量(例如10^5维);(2)自监督多锚点编码器网络(SMEN)将BoI特征序列映射至多个低维用户表征。特别地,SMEN得益于一种能够学习用户兴趣不同方面的新型多锚点模块,实现了近乎无损的降维。在多个基准数据集上的实验表明,我们的方法优于当前最先进的通用表征方法。