Effective personalization on large-scale job platforms requires modeling members based on heterogeneous textual sources, including profiles, professional data, and search activity logs. As recommender systems increasingly adopt Large Language Models (LLMs), creating unified, interpretable, and concise representations from heterogeneous sources becomes critical, especially for latency-sensitive online environments. In this work, we propose a novel Reinforcement Learning (RL) framework to synthesize a unified textual representation for each member. Our approach leverages implicit user engagement signals (e.g., clicks, applies) as the primary reward to distill salient information. Additionally, the framework is complemented by rule-based rewards that enforce formatting and length constraints. Extensive offline experiments across multiple LinkedIn products, one of the world's largest job platforms, demonstrate significant improvements in key downstream business metrics. This work provides a practical, labeling-free, and scalable solution for constructing interpretable user representations that are directly compatible with LLM-based systems.
翻译:在大型招聘平台上实现有效个性化,需要基于异构文本源对用户进行建模,这些文本源包括个人资料、职业数据及搜索活动日志。随着推荐系统日益广泛采用大型语言模型(LLM),从异构数据源中构建统一、可解释且简洁的表征变得至关重要,尤其是在对延迟敏感的在线环境中。本研究提出一种新颖的强化学习框架,旨在为每位用户合成统一的文本表征。该方法以隐式用户参与信号(如点击、申请)作为主要奖励,以提炼关键信息;同时,通过基于规则的奖励机制来强化格式与长度约束。在领英(全球最大的招聘平台之一)多个产品上进行的广泛离线实验表明,该方法在关键下游业务指标上取得了显著提升。本研究为构建可解释的用户表征提供了一种实用、无需标注且可扩展的解决方案,该表征能够直接兼容基于LLM的系统。