As AI-generated and AI-assisted content floods online spaces, source labels attached to such content can distort human reasoning judgments, with downstream consequences for moderation, evaluation, and decision-making. Whether LLMs share this vulnerability, or offer more source-agnostic evaluation, remains an open question with direct implications for human-AI collaboration. We examine this issue using logical fallacies as a controlled setting to isolate source-label effects on reasoning quality, independent of domain knowledge. We conduct an online study (N=505) where participants are assigned to a source condition (human, AI, human with AI assistance, AI with human assistance, or no disclosure) and evaluate comments containing logical fallacies, comparing their judgments with those of LLMs (GPT-5.2, Gemini 2.5 Flash, Claude Sonnet 4.5), who were evaluated across the same source conditions. Human evaluators were significantly more susceptible to fallacies labeled as written by human or human with AI assistance and assigned higher trust and evaluation ratings in these conditions. LLM evaluations remained comparatively stable across source labels, though performance varied across models. Confidence levels were similarly high across conditions for both humans and LLMs, regardless of fallacy presence. Our findings indicate that source-label bias in reasoning evaluation is primarily a human vulnerability and highlight the potential of human-LLM collaboration in increasingly AI-mediated environments.
翻译:随着AI生成和AI辅助内容充斥网络空间,附着于此类内容的来源标签会扭曲人类的推理判断,并对审核、评估与决策产生下游影响。大语言模型(LLM)是否同样存在这种脆弱性,或能提供更独立的来源评估,仍是一个悬而未决的问题,直接影响人机协作。我们以逻辑谬误为受控场景来研究此问题,以独立于领域知识的方式分离来源标签对推理质量的影响。我们开展了一项在线研究(N=505),参与者被随机分配至不同来源条件(人类、AI、人类借助AI、AI辅助人类、或无披露),并评估包含逻辑谬误的评论,将其判断与相同来源条件下评估的LLM(GPT-5.2、Gemini 2.5 Flash、Claude Sonnet 4.5)进行对比。人类评估者对标注为"人类撰写"或"人类借助AI撰写"的谬误显著更易受影响,并在此类条件下赋予更高信任度与评估评分。LLM评估在不同来源标签下保持相对稳定,但模型间表现存在差异。无论是否存在谬误,人类与LLM在各条件下的置信水平均较高。我们的研究结果表明,推理评估中的来源标签偏见主要源于人类脆弱性,并凸显了在日益AI中介化的环境中人机协作的潜力。