In the progressive journey toward Artificial General Intelligence (AGI), current evaluation paradigms face an epistemological crisis. Static benchmarks measure knowledge breadth but fail to quantify the depth of belief. While Simhi et al. (2025) defined the CHOKE phenomenon in standard QA, we extend this framework to quantify "Cognitive Conviction" in System 2 reasoning models. We propose Project Aletheia, a cognitive physics framework that employs Tikhonov Regularization to invert the judge's confusion matrix. To validate this methodology without relying on opaque private data, we implement a Synthetic Proxy Protocol. Our preliminary pilot study on 2025 baselines (e.g., DeepSeek-R1, OpenAI o1) suggests that while reasoning models act as a "cognitive buffer," they may exhibit "Defensive OverThinking" under adversarial pressure. Furthermore, we introduce the Aligned Conviction Score (S_aligned) to verify that conviction does not compromise safety. This work serves as a blueprint for measuring AI scientific integrity.
翻译:在通往通用人工智能(AGI)的渐进历程中,当前评估范式正面临认识论危机。静态基准测试虽能衡量知识的广度,却无法量化信念的深度。尽管Simhi等人(2025)在标准问答任务中定义了CHOKE现象,本研究将该框架扩展至量化System 2推理模型的“认知确信度”。我们提出Aletheia项目——一个运用吉洪诺夫正则化对评判者混淆矩阵进行逆运算的认知物理学框架。为在不依赖不透明私有数据的前提下验证该方法,我们设计了合成代理协议。基于2025年基线模型(如DeepSeek-R1、OpenAI o1)的初步实验表明:推理模型虽能充当“认知缓冲器”,但在对抗压力下可能表现出“防御性过度思考”。此外,我们提出对齐确信度评分(S_aligned)以验证确信度提升不会损害安全性。本工作为衡量人工智能科学完整性提供了蓝图。