Vibe coding produces correct, executable code at speed, but leaves no record of the structural commitments, dependencies, or evidence behind it. Reviewers cannot determine what invariants were assumed, what changed, or why a regression occurred. This is not a generation failure but a control failure: the dominant artifact of AI-assisted development (code plus chat history) performs dimension collapse, flattening complex system topology into low-dimensional text and making systems opaque and fragile under change. We propose Agentic Consensus: a paradigm in which the consensus layer C, an operable world model represented as a typed property graph, replaces code as the primary artifact of engineering. Executable artifacts are derived from C and kept in correspondence via synchronization operators Phi (realize) and Psi (rehydrate). Evidence links directly to structural claims in C, making every commitment auditable and under-specification explicit as measurable consensus entropy rather than a silent guess. Evaluation must move beyond code correctness toward alignment fidelity, consensus entropy, and intervention distance. We propose benchmark task families designed to measure whether consensus-based workflows reduce human intervention compared to chat-driven baselines.
翻译:氛围编码虽能快速生成正确可执行的代码,但未记录代码背后的结构承诺、依赖关系和验证依据。审查者无法确定代码所依赖的不变量、变更内容或性能退化成因。这并非生成失败,而是控制失效:人工智能辅助开发的主要产物(代码加聊天记录)会产生维度坍缩,将复杂系统拓扑结构压缩为低维文本,导致系统在变更时变得不透明且脆弱。我们提出"主体共识"范式:将共识层C(以类型化属性图呈现的可操作世界模型)取代代码作为工程核心产物。可执行产物从C派生,并通过同步算子Phi(实现)与Psi(重构)保持对应关系。证据直接关联C中的结构主张,使每项承诺均可审计,并将欠规格显性化为可测量的共识熵而非隐含假设。评估标准应从代码正确性转向对齐保真度、共识熵和干预距离。我们提出基准任务族,旨在量化基于共识的工作流相比聊天驱动基线能否减少人工干预。