We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.
翻译:我们报告了Aletheia(Feng等人,2026b)——一个基于Gemini 3 Deep Think驱动的数学研究智能体——在首届FirstProof挑战中的表现。在挑战规定的时间范围内,根据多数专家评估,Aletheia自主解决了10道题目中的6道(第2、5、7、8、9、10题);我们注意到仅在第8题上专家意见未达成一致。为确保完全透明,我们阐述了自身对FirstProof的理解,并公开了实验细节与评估方法。原始提示词与输出结果可在https://github.com/google-deepmind/superhuman/tree/main/aletheia获取。