Conceptual analysis -- proposing definitions and refining them through counterexamples -- is central to philosophical methodology. We study whether language models can perform this task through iterated analysis and repair chains: one model instance generates counterexamples to a proposed definition, another repairs the definition, and the process repeats. Across 20 concepts and thousands of counterexample-repair cycles, we find that, although many LM-generated counterexamples are judged invalid by both expert humans and an LM judge, the LM judge accepts roughly twice as many as humans do. Nonetheless, per-item validity judgments are moderately consistent across humans and between humans and the LM. We further find that extended iteration produces increasingly verbose definitions without improving accuracy. We also see that some concepts resist stable definitions in general. These findings suggest that while LMs can engage in philosophical reasoning, the counterexample-repair loop hits diminishing returns quickly and could be a fruitful test case for evaluating whether LMs can sustain high-level iterated philosophical reasoning.
翻译:概念分析——提出定义并通过反例加以完善——是哲学方法论的核心。我们研究语言模型能否通过迭代分析与修正链完成该任务:一个模型实例针对已有定义生成反例,另一实例修正定义,并重复此过程。在20个概念与数千个反例修正循环中,我们发现:尽管专家人类与语言模型评判员均判定多数语言模型生成的反例无效,但语言模型评判员接受反例的比例约为人类的两倍。然而,逐项有效性判断在人类间及人类与语言模型间具有中等一致性。进一步研究表明,延长迭代轮次会产生愈发冗长的定义,但未提升准确性。我们还发现某些概念普遍难以形成稳定定义。这些结果表明:尽管语言模型能够参与哲学推理,反例修正循环会迅速陷入收益递减,并可能成为评估语言模型能否维持高阶迭代哲学推理的有效测试案例。