Due to the correlational structure in our traits such as identities, cultures, and political attitudes, seemingly innocuous preferences such as following a band or using a specific slang, can reveal private traits. This possibility, especially when combined with massive, public social data and advanced computational methods, poses a fundamental privacy risk. Given our increasing data exposure online and the rapid advancement of AI are increasing the misuse potential of such risk, it is therefore critical to understand capacity of large language models (LLMs) to exploit it. Here, using online discussions on Debate.org and Reddit, we show that LLMs can reliably infer hidden political alignment, significantly outperforming traditional machine learning models. Prediction accuracy further improves as we aggregate multiple text-level inferences into a user-level prediction, and as we use more politics-adjacent domains. We demonstrate that LLMs leverage the words that can be highly predictive of political alignment while not being explicitly political. Our findings underscore the capacity and risks of LLMs for exploiting socio-cultural correlates.
翻译:由于身份、文化及政治态度等特质之间存在关联结构,诸如关注某支乐队或使用特定俚语等看似无害的偏好,可能泄露个人隐私特质。这种可能性——尤其当与海量公开社交数据及先进计算方法结合时——构成了根本性的隐私风险。鉴于我们在线数据暴露程度的不断增加以及人工智能的快速发展正加剧此类风险的滥用潜力,因此理解大型语言模型(LLMs)利用这种关联的能力至关重要。本研究基于Debate.org和Reddit的在线讨论数据,证明LLMs能够可靠地推断隐藏的政治立场,其表现显著优于传统机器学习模型。当我们将多个文本层面的推断聚合为用户层面的预测,以及使用更多政治相关领域的数据时,预测准确率会进一步提升。我们证实LLMs能够利用那些虽非明确涉及政治但高度预示政治立场的词汇。我们的研究结果凸显了LLMs利用社会文化关联的能力及其潜在风险。