Silicon samples are increasingly used as a low-cost substitute for human panels and have been shown to reproduce aggregate human opinion with high fidelity. We show that, in the alignment-relevant domain of philosophy, silicon samples systematically collapse heterogeneity. Using data from $N = {277}$ professional philosophers drawn from PhilPeople profiles, we evaluate seven proprietary and open-source large language models on their ability to replicate individual philosophical positions and to preserve cross-question correlation structures across philosophical domains. We find that language models substantially over-correlate philosophical judgments, producing artificial consensus across domains. This collapse is associated in part with specialist effects, whereby models implicitly assume that domain specialists hold highly similar philosophical views. We assess the robustness of these findings by studying the impact of DPO fine-tuning and by validating results against the full PhilPapers 2020 Survey ($N = {1785}$). We conclude by discussing implications for alignment, evaluation, and the use of silicon samples as substitutes for human judgment. The code of this project can be found at https://github.com/stanford-del/silicon-philosophers.
翻译:硅样本正日益被用作人类面板的低成本替代品,并被证明能够高保真地再现人类群体的意见。我们证明,在涉及对齐的哲学领域,硅样本系统性地抹杀了异质性。基于从 PhilPeople 档案中获取的 277 位专业哲学家的数据,我们评估了七款专有及开源大语言模型在复现个体哲学立场、保持跨哲学领域问题间相关结构方面的能力。研究发现,语言模型显著过度关联了哲学判断,在不同领域间产生了人为的共识。这种坍塌部分源于专家效应,即模型隐含地假设领域专家持有高度相似的哲学观点。我们通过研究 DPO 微调的影响,并依据完整的 PhilPapers 2020 调查(样本量 1785 人)对结果进行验证,从而评估了这些发现的稳健性。最后,我们讨论了这一发现对对齐、评估以及将硅样本作为人类判断替代品的用意。本项目代码可于 https://github.com/stanford-del/silicon-philosophers 获取。