In the user targeting and expanding of new shows on a video platform, the key point is how their embeddings are generated. It's supposed to be personalized from the perspective of both users and shows. Furthermore, the pursue of both instant (click) and long-time (view time) rewards, and the cold-start problem for new shows bring additional challenges. Such a problem is suitable for processing by heterogeneous graph models, because of the natural graph structure of data. But real-world networks usually have billions of nodes and various types of edges. Few existing methods focus on handling large-scale data and exploiting different types of edges, especially the latter. In this paper, we propose a two-stage audience expansion scheme based on an edge-prompted heterogeneous graph network which can take different double-sided interactions and features into account. In the offline stage, to construct the graph, user IDs and specific side information combinations of the shows are chosen to be the nodes, and click/co-click relations and view time are used to build the edges. Embeddings and clustered user groups are then calculated. When new shows arrive, their embeddings and subsequent matching users can be produced within a consistent space. In the online stage, posterior data including click/view users are employed as seeds to look for similar users. The results on the public datasets and our billion-scale data demonstrate the accuracy and efficiency of our approach.
翻译:在视频平台新节目的用户定向与扩展中,核心问题在于如何生成其嵌入表示。这种嵌入应从用户和节目两个视角进行个性化处理。此外,同时追求即时(点击)与长期(观看时长)收益,以及新节目的冷启动问题带来了额外挑战。由于数据天然的图结构特性,此类问题适用于异构图模型处理。然而现实网络通常包含数十亿节点与多种边类型。现有方法鲜有专注于处理大规模数据并利用不同类型的边,尤其后者。本文提出一种基于边缘提示异构图网络的两阶段受众扩展方案,可综合考虑不同双向交互与特征。离线阶段,为构建图结构,选取用户ID与节目特定侧信息组合作为节点,利用点击/共点击关系及观看时长构建边,进而计算嵌入向量与聚类用户组。当新节目上线时,其嵌入及后续匹配用户可在统一空间中生成。在线阶段,采用包含点击/观看用户的后验数据作为种子用户集,以寻找相似用户。在公开数据集与十亿级数据上的实验结果表明了本方法的准确性与高效性。