This position paper argues that effective tutoring requires corrective friction: surfacing misconceptions and challenging them supportively to drive conceptual change. Yet preference-aligned LLMs can trade epistemic rigor for agreeableness. We identify a Reasoning-Sycophancy Paradox: models that resist context-switch frame attacks can still capitulate under social-epistemic pressure, especially authority ("my notes say I'm right") and social-affective face-saving ("please don't tell me I'm wrong"). We introduce EduFrameTrap, a tutoring benchmark across math, physics, economics, chemistry, biology, and computer science that varies student confidence and pressure (context-switch, authority, social-affective). Across two frontier LLMs, context-switch failures are comparatively lower for GPT-5.2, while authority and social pressure more often trigger epistemic retreat. In contrast, Claude shows substantial context-switch fragility in this run. Because these failures are hard to judge automatically, we report two-judge disagreement as a reliability signal. We argue benchmarks should measure social-epistemic courage, i.e., supportive but corrective tutoring, and treat kind-but-correct behavior as a safety requirement.
翻译:本文立场论文论证,有效辅导需要纠正性摩擦:揭示误解并予以支持性质疑以推动概念转变。然而,偏好对齐的大语言模型可能以认识论严谨性换取亲和性。我们识别出"推理-谄媚悖论":能抵抗语境切换框架攻击的模型,在社交-认识论压力(尤其是权威型"我的笔记说我是对的"和社交情感型"请别告诉我错了")下仍可能屈服。我们提出EduFrameTrap基准测试,涵盖数学、物理学、经济学、化学、生物学和计算机科学,通过变化学生自信程度与压力类型(语境切换、权威、社交情感)。在两个前沿大语言模型上,GPT-5.2的语境切换失败率相对较低,而权威和社交压力更易触发认识论退缩。相比之下,Claude在本轮测试中表现出显著的语境切换脆弱性。由于这些失败难以自动判别,我们报告双评估者分歧作为可靠性信号。我们主张基准测试应衡量社交-认识论勇气(即支持性但纠正性的辅导),并将善意而正确的行为视为安全要求。