AI systems coupled to proof assistants now generate formal mathematics at scale, and the gap between what a checker can verify and what a mathematician would value has become the binding constraint. We model the generation of valuable mathematics as nested language generation in the limit: a verifiable formal language $F$, accessed through a membership oracle (the proof checker), contains an unknown valuable language $H \in \mathcal{H}$ revealed only through an adversarial enumeration of a core $C \subseteq H$ of exact density $α$ (the literature). Every output is valuable ($\in H$), trivial ($\in F \setminus H$), or a hallucination ($\notin F$). We settle four questions. First, the verifier is not taste: the collections admitting generation with breadth are exactly those of the oracle-free model, characterized fiber-wise by Angluin's condition. Second, the verifier does buy sound coverage, covering all unseen valuable statements while asserting only valid ones: possible with it, impossible without it; it relocates unavoidable errors from false to trivial. Third, and centrally, a sharp dichotomy on the tight family: generators emitting finitely many trivia achieve optimal coverage $α/2$, while any infinite trivia allowance, even at vanishing rate, jumps the optimum to $1-α/2$ (both tight, for cores presented as the candidate intersection), and one generator attains both ends. The transition is in trivia count, not rate; the gap $1-α$ is the unrecorded mass. Fourth, both regimes instantiate in a compression model of mathematics. A perfect verifier cannot substitute for taste: the unbounded stream of correct-but-worthless statements is not an engineering accident but a provable necessity, since covering unrecorded valuable mathematics requires an infinite, but asymptotically negligible, stream of certified trivia.
翻译:与证明助手耦合的人工智能系统如今能够大规模生成形式化数学内容,而验证器可验证性与数学家认可价值之间的鸿沟已成为关键约束。我们将有价值数学内容的生成建模为极限嵌套语言生成过程:通过成员关系预言机(证明验证器)访问的可验证形式语言$F$,包含一个未知的有价值语言$H \in \mathcal{H}$,该语言仅通过核心子集$C \subseteq H$(即文献)的对抗性枚举以精确密度$α$揭示。每个输出要么具有价值($\in H$)、要么琐碎($\in F \setminus H$)、要么产生幻觉($\notin F$)。我们解决了四个问题。其一,验证器无法替代品味:允许广度生成的语言集合恰好等同于无预言机模型中的集合,特征化条件由Angluin条件在纤维层面给出。其二,验证器确能保证可靠覆盖率:在仅断言有效语句的同时覆盖所有未见的有价值语句——有验证器时可行,无验证器时不可行;它把不可避免的错误从假阳性转移到琐碎语句。其三,核心结论是关于紧致族的一个尖锐二分法:仅生成有限条琐碎语句的生成器可实现最优覆盖率$α/2$,而任何允许无限条琐碎语句(即使渐近消失)的生成器,其最优覆盖率跃升至$1-α/2$(两者均为紧界,适用于以候选交集形式呈现的核心),且存在单一生成器同时达到两个端点。该跃迁取决于琐碎语句数量而非生成速率;间隙$1-α$正是未被记录的数学知识本体。其四,两种机制均可在数学的压缩模型中得到实例化。完美验证器无法替代品味:无限的正确但无价值语句流并非工程事故,而是可证明的必然性——因为要覆盖未被记录的有价值数学内容,就必须生成无限条(虽渐近可忽略的)已认证琐碎语句。