The democratization of Large Language Models has given rise to vibe coding, where novice programmers prioritize semantic intent over syntactic implementation. Without pedagogical guardrails, we argue this is fundamentally misaligned with cognitive skill acquisition. Drawing on Kirschner's distinction between cognitive offloading and outsourcing, unrestricted AI encourages novices to outsource the intrinsic cognitive load required for schema formation rather than merely offloading extraneous load. This accumulation of epistemic debt creates fragile experts: developers whose high functional utility masks critically low corrective competence. To quantify and mitigate this debt, we conducted a between-subjects experiment (N=78) using a custom Cursor IDE plugin backed by Claude 3.5 Sonnet. Participants were recruited via Prolific and UserInterviews.com to represent AI-native learners. We compared three conditions: manual (control), unrestricted AI (outsourcing), and scaffolded AI (offloading). The scaffolded condition employed a novel Explanation Gate -- a real-time LLM-as-a-Judge framework enforcing a teach-back protocol before generated code could be integrated. Results reveal a collapse of competence: both AI groups significantly outperformed the manual control on functional utility (p < .001) and did not differ from each other (p = .64), yet unrestricted AI users suffered a 77% failure rate on a subsequent 30-minute AI-blackout maintenance task, vs. only 39% in the scaffolded group. Qualitative analysis suggests successful vibe coders naturally self-scaffold, treating AI as a consultant rather than a contractor. We discuss implications for AI-generated software maintainability and propose that future learning systems must enforce metacognitive friction to prevent mass production of unmaintainable code. Replication package: https://github.com/sreecharansankaranarayanan/vibecheck
翻译:大型语言模型的普及催生了“氛围编码”现象,即新手程序员优先考虑语义意图而非句法实现。我们认为,若无教学约束,这与认知技能习得的基本逻辑存在根本性错位。基于Kirschner对认知卸载与外包的区分,无限制的人工智能会诱导新手外包模式形成所需的内在认知负荷,而非仅卸载额外负荷。这种知识债务的累积催生了“脆弱型专家”:功能性效用极高但纠错能力极低的开发者。为量化和缓解此类债务,我们进行了组间实验(N=78),使用基于Claude 3.5 Sonnet定制的Cursor IDE插件。受试者通过Prolific和UserInterviews.com招募,代表AI原住民学习者。我们比较了三种条件:手动(控制组)、无限制AI(外包组)和支架式AI(卸载组)。支架组采用了新颖的“解释门”——一个实时LLM即评判器框架,在代码生成前强制执行“教回协议”。结果显示能力崩溃:AI组在功能性效用上显著优于手动控制组(p < .001),且两个AI组之间无显著差异(p = .64);然而无限制AI用户在随后的30分钟AI禁用维护任务中失败率达77%,而支架组仅为39%。定性分析表明,成功的氛围编码者会自然进行自我支架,将AI视为顾问而非承包商。我们讨论了AI生成软件可维护性的启示,并提出未来学习系统必须强制施加元认知摩擦,以阻止不可维护代码的大规模生产。复现包:https://github.com/sreecharansankaranarayanan/vibecheck