As Large Language Model based agents enter autonomous scientific research, their ability to resist pseudoscience becomes increasingly important. Otherwise, such systems may rapidly generate plausible yet misleading studies that contaminate academic literature and erode trust in science. We present PseudoBench, an adversarial benchmark for evaluating whether agentic auto-research systems can identify and resist pseudoscientific narratives. PseudoBench contains 200 curated pseudoscientific claim-evidence pairs across five domains and evaluates agents through an end-to-end research pipeline from experiments to writing. Testing seven state-of-the-art agents, we find that current systems readily produce persuasive reports that align with pseudoscientific premises with near-zero refusal rates and the highest resistance of only 27.4%. Stronger agents risk packaging pseudoscience in more sophisticated scientific language, increasing its apparent credibility. These findings reveal an alarming capacity to fuel pseudoscience, calling for scientific alignment before widespread deployment.
翻译:随着基于大语言模型的代理进入自主科学研究领域,其抵御伪科学的能力变得日益重要。否则,此类系统可能快速生成看似合理却具有误导性的研究,污染学术文献并侵蚀科学信任。我们提出PseudoBench,一个用于评估自动化研究代理系统能否识别并抵制伪科学叙事的对抗性基准。PseudoBench包含跨五个领域的200个经精心筛选的伪科学主张-证据对,并通过从实验到写作的端到端研究流程评估代理。对七个最先进的代理进行测试后,我们发现当前系统极易生成与伪科学前提相符的具有说服力的报告,其拒绝率近乎为零,最高抵抗率仅为27.4%。能力更强的代理更可能用更复杂的科学语言包装伪科学,从而提升其表面可信度。这些发现揭示了当前系统助长伪科学的惊人能力,呼吁在广泛部署前进行科学对齐。