X平台推荐系统无意中刻画用户意识形态立场 (Recommender system in X inadvertently profiles ideological positions of users)

Studies on recommendations in social media have mainly analyzed the quality of recommended items (e.g., their diversity or biases) and the impact of recommendation policies (e.g., in comparison with purely chronological policies). We use a data donation program, collecting more than 2.5 million friend recommendations made to 682 volunteers on X over a year, to study instead how real-world recommenders learn, represent and process political and social attributes of users inside the so-called black boxes of AI systems. Using publicly available knowledge on the architecture of the recommender, we inferred the positions of recommended users in its embedding space. Leveraging ideology scaling calibrated with political survey data, we analyzed the political position of users in our study (N=26,509 among volunteers and recommended contacts) among several attributes, including age and gender. Our results show that the platform's recommender system produces a spatial ordering of users that is highly correlated with their Left-Right positions (Pearson rho=0.887, p-value < 0.0001), and that cannot be explained by socio-demographic attributes. These results open new possibilities for studying the interaction between human and AI systems. They also raise important questions linked to the legal definition of algorithmic profiling in data privacy regulation by blurring the line between active and passive profiling. We explore new constrained recommendation methods enabled by our results, limiting the political information in the recommender as a potential tool for privacy compliance capable of preserving recommendation relevance.

翻译：社交媒体推荐研究主要分析推荐内容的质量（如多样性或偏见）及推荐策略的影响（例如与纯时间顺序策略的比较）。本研究通过数据捐赠项目，收集了X平台在一年内向682名志愿者提供的超过250万条好友推荐数据，旨在探究现实世界推荐系统如何在人工智能所谓的"黑箱"内部学习、表征和处理用户的政治与社会属性。利用公开的推荐系统架构知识，我们推断了被推荐用户在嵌入空间中的位置。通过结合政治调查数据校准的意识形态标度，我们分析了研究样本中用户（N=26,509，含志愿者及被推荐联系人）在年龄、性别等多重属性上的政治立场。研究结果表明：该平台推荐系统生成的空间排序与用户的左右立场高度相关（皮尔逊相关系数ρ=0.887，p值<0.0001），且这种关联无法用社会人口学属性解释。这些发现为研究人类与人工智能系统的交互开辟了新路径，同时通过模糊主动刻画与被动刻画的界限，对数据隐私监管中算法刻画的法定定义提出了重要质疑。基于研究结果，我们探索了新的约束推荐方法——通过限制推荐系统中的政治信息，构建既能保持推荐相关性又符合隐私合规要求的潜在工具。