As AI-generated and AI-assisted content floods online spaces, source labels attached to such content can distort human reasoning judgments, with downstream consequences for moderation, evaluation, and decision-making. Whether LLMs share this vulnerability, or offer more source-agnostic evaluation, remains an open question with direct implications for human-AI collaboration. We examine this issue using logical fallacies as a controlled setting to isolate source-label effects on reasoning quality, independent of domain knowledge. We conduct an online study (N=505) where participants are assigned to a source condition (human, AI, human with AI assistance, AI with human assistance, or no disclosure) and evaluate comments containing logical fallacies, comparing their judgments with those of LLMs (GPT-5.2, Gemini 2.5 Flash, Claude Sonnet 4.5), who were evaluated across the same source conditions. Human evaluators were significantly more susceptible to fallacies labeled as written by human or human with AI assistance and assigned higher trust and evaluation ratings in these conditions. LLM evaluations remained comparatively stable across source labels, though performance varied across models. Confidence levels were similarly high across conditions for both humans and LLMs, regardless of fallacy presence. Our findings indicate that source-label bias in reasoning evaluation is primarily a human vulnerability and highlight the potential of human-LLM collaboration in increasingly AI-mediated environments.
翻译:随着人工智能生成及辅助内容充斥网络空间,此类内容附带的来源标签可能扭曲人类推理判断,对内容审核、评估与决策产生下游影响。大语言模型是否同样存在这一漏洞,抑或能提供更独立于来源的评估,这一开放性问题直接关系到人机协作。我们以逻辑谬误为受控场景,独立于领域知识来隔离来源标签对推理质量的影响。通过在线研究(N=505),参与者被随机分配至不同来源条件(人类、AI、人类辅助AI、AI辅助人类、无信息披露),对包含逻辑谬误的评论进行评价,并将结果与相同来源条件下的大语言模型(GPT-5.2、Gemini 2.5 Flash、Claude Sonnet 4.5)评估结果进行对比。人类评估者对标注为"人类撰写"或"人类辅助AI"的谬误表现出显著更高的易感性,并在这些条件下赋予更高的信任度与评分。大语言模型的评估在不同来源标签间保持相对稳定,但模型间存在性能差异。无论谬误是否存在,人类与模型在各类条件下的置信度均维持高水平。研究结果表明,推理评估中的来源标签偏倚主要源于人类自身,并凸显了在日益AI中介化的环境中人机协作的潜力。