As large language models (LLMs) advance in linguistic competence, their reasoning abilities are gaining increasing attention. In humans, reasoning often performs well in domain specific settings, particularly in normative rather than purely formal contexts. Although prior studies have compared LLM and human reasoning, the domain specificity of LLM reasoning remains underexplored. In this study, we introduce a new Wason Selection Task dataset that explicitly encodes deontic modality to systematically distinguish deontic from descriptive conditionals, and use it to examine LLMs' conditional reasoning under deontic rules. We further analyze whether observed error patterns are better explained by confirmation bias (a tendency to seek rule-supporting evidence) or by matching bias (a tendency to ignore negation and select items that lexically match elements of the rule). Results show that, like humans, LLMs reason better with deontic rules and display matching-bias-like errors. Together, these findings suggest that the performance of LLMs varies systematically across rule types and that their error patterns can parallel well-known human biases in this paradigm.
翻译:随着大型语言模型(LLMs)语言能力的提升,其推理能力日益受到关注。在人类认知中,推理通常在特定领域环境中表现更佳,尤其是在规范性而非纯形式化的语境中。尽管已有研究比较了LLM与人类推理,但LLM推理的领域特异性仍未得到充分探索。本研究引入了一个新的沃森选择任务数据集,该数据集通过显式编码道义模态来系统区分道义条件句与描述性条件句,并以此考察LLMs在道义规则下的条件推理能力。我们进一步分析了观察到的错误模式更适合用确认偏差(倾向于寻找支持规则的证据)还是匹配偏差(倾向于忽略否定并选择与规则要素词汇匹配的项目)来解释。结果表明,与人类相似,LLMs在道义规则下推理表现更优,并表现出类匹配偏差的错误。这些发现共同表明,LLMs的性能在不同规则类型间存在系统性差异,且其错误模式可与该范式中已知的人类认知偏差相平行。