Large Language Models (LLMs) are widely used by students, yet their tendency to provide fast and complete answers may discourage reflection and foster overconfidence. We examined how alternative LLM interaction designs support deeper thinking without excessively increasing cognitive burden. We conducted a two-phase mixed-methods study. In Phase 1, interviews with 16 Gen Z students informed the design of Deep3, a web-based system with three interaction modes: \emph{a)} future-self explanations, \emph{b)} contrastive learning, and \emph{c)} guided hints. In Phase 2, we evaluated Deep3 with 85 participants across two learning tasks. We found that a standard single-agent baseline produced high perceived understanding despite the lowest objective learning. In contrast, future-self explanations imposed higher cognitive workload yet yielded the closest alignment between perceived and actual understanding, while guided hints achieved the largest learning gains without a proportional increase in frustration. These findings show that effort, confidence, and learning systematically diverge in LLM-supported work.
翻译:大语言模型(LLMs)被学生广泛使用,但其提供快速且完整答案的倾向可能抑制反思并助长过度自信。本研究探讨了替代性LLM交互设计如何在不过度增加认知负担的前提下支持深度思考。我们开展了一项两阶段混合方法研究。在第一阶段,对16名Z世代学生的访谈为设计Deep3系统提供了依据——这是一个具有三种交互模式的网络系统:a) 未来自我解释、b) 对比学习、c) 引导式提示。在第二阶段,我们通过两项学习任务对85名参与者评估了Deep3系统。研究发现:标准单智能体基线虽产生最低客观学习效果,却带来最高感知理解度;相比之下,未来自我解释模式虽施加更高认知负荷,却使感知理解与实际理解最为一致;而引导式提示模式在学习增益最大化的同时,并未引发成比例的挫败感增加。这些发现表明,在LLM支持的工作中,努力程度、自信水平与学习效果之间存在系统性背离。