Large language models are how hundreds of millions of people now encounter contested political questions, raising a subtle measurement problem: a model that simply agrees with whatever it is told can masquerade as biased, contaminating any claim that models hold political opinions. We address this by importing balanced keying from survey psychometrics, posing each proposition and its swapped reverse and signing the response so acquiescence cancels and genuine conviction accumulates. The result is a reproducible, quantitative instrument that maps geopolitical stance across 11 models and 2 languages (19,712 responses). Developer origin, query language and issue domain emerge as three near-equal, additive factors; every model, including those built in the United States, leans more Pro-China in Mandarin; and two models with identical agreement bias are told apart, one neutral, one biased. We release it as an open, interactive tool that extends to any contested-opinion domain.
翻译:大型语言模型是数亿人如今接触有争议政治问题的方式,这引发了一个微妙的测量问题:一个简单服从所听内容的模型可能伪装成有偏见,从而污染任何声称模型持有政治观点的论断。为此,我们从调查心理测量学中引入平衡键控,对每个命题及其交换的反向命题进行配对,并对回答进行符号化处理,使得附和倾向相互抵消,而真实信念得以累积。结果产生了一个可复现、定量的工具,能够刻画11种模型和2种语言(19712个回答)的地缘政治立场。开发者来源、查询语言和问题领域是三个近乎相等且可叠加的因素;所有模型,包括那些在美国构建的模型,在中文语境下都更倾向于亲中立场;两个具有相同附和倾向的模型被区分开来,一个中立,另一个有偏见。我们将其作为开放、交互式工具发布,可扩展至任何有争议意见领域。