Foundation Models for Discovery and Exploration in Chemical Space - 专知论文

会员服务 ·

0

Foundation Models for Discovery and Exploration in Chemical Space

翻译：暂无翻译

Alexius Wadell,Anoushka Bhutani,Victor Azumah,Austin R. Ellis-Mohr,Andrew J. Stier,Kareem Hegazy,Alexander Brace,Hancheng Zhao,Celia Kelly,Anuj K. Nayak,Yuhan Chen,Dimitrios Simatos,Hongyi Lin,Murali Emani,Venkatram Vishwanath,Kevin Gering,Melisa Alkan,Tom Gibbs,Jack Wells,Wesley W. Qian,Richard C. Gerkin,Benjamin Amorelli,Alexander B. Wiltschko,Lav R. Varshney,Bharath Ramsundar,Karthik Duraisamy,Michael W. Mahoney,Arvind Ramanathan,Venkatasubramanian Viswanathan

from arxiv, Main manuscript: 30 pages (including references), 7 tables and 5 figures. Supplementary information: 158 pages (including references), 15 tables and 128 figures

Accurate prediction of atomistic, thermodynamic, and kinetic properties from molecular structures underpins materials innovation. Existing computational and experimental approaches lack the scalability required to navigate chemical space efficiently. Scientific foundation models trained on large unlabelled datasets offer a path towards navigating chemical space across application domains. Here, we develop MIST, a family of molecular foundation models with up to an order of magnitude more parameters and data than prior works. Trained using a novel tokenizer, Smirk, which comprehensively captures nuclear, electronic, and geometric information, MIST learns a diverse range of molecules. MIST models have been fine-tuned to predict more than 400 structure-property relationships and have been shown to match or exceed state-of-the-art performance across diverse benchmarks, from physiology to electrochemistry. We demonstrate the ability of these models to solve real-world problems across chemical space from multiobjective electrolyte solvent screening to stereochemical reasoning for organometallics and mixture property prediction. The clearest demonstration of a foundation model is its ability to solve problems that were neither explicit targets of training nor central to the intentions of its developers. We identify olfactory perception mapping as such a problem, and show that MIST accurately predicted scent profiles and learned a hierarchical representation of olfactory space consistent with hyperbolic geometry. We formulated hyperparameter aware Bayesian neural scaling laws which eliminate the need for hyperparameter sweeps at every scale, making training large compute-optimal models feasible on a limited compute budget. The methods and findings presented here represent a significant step towards accelerating materials discovery, design, and optimization using foundation models.

翻译：暂无翻译

0

相关内容

【新书】基于物理的模拟

【新书】基于物理的模拟

专知会员服务

23+阅读 · 2025年7月25日

大模型及其在材料科学中的应用与展望

大模型及其在材料科学中的应用与展望

专知会员服务

49+阅读 · 2023年12月13日

上海交大姚振鹏副教授团队在《Nature Reviews Materials》发表人工智能加速材料发现综述论文

上海交大姚振鹏副教授团队在《Nature Reviews Materials》发表人工智能加速材料发现综述论文

专知会员服务

24+阅读 · 2022年10月31日

综述：基于进化和物理启发建模的计算蛋白设计

综述：基于进化和物理启发建模的计算蛋白设计

专知会员服务

16+阅读 · 2022年9月12日

【AI+新材料】MIT学者利用计算建模指导新材料的开发，Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal–Organic Frameworks

【AI+新材料】MIT学者利用计算建模指导新材料的开发，Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal–Organic Frameworks

专知会员服务

24+阅读 · 2022年3月13日

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

2022最新发表《绿色碳科学：双碳目标下的科学基础 ——第 292 期“双清论坛”学术综述》

2022最新发表《绿色碳科学：双碳目标下的科学基础 ——第 292 期“双清论坛”学术综述》

专知会员服务

18+阅读 · 2022年2月12日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【Nature交叉学科论文】机器学习在固体材料科学中的最新进展和应用，Recent advances and applications of machine learning in solidstate materials science

【Nature交叉学科论文】机器学习在固体材料科学中的最新进展和应用，Recent advances and applications of machine learning in solidstate materials science

专知会员服务

36+阅读 · 2019年12月21日

赛尔译文｜基础模型的风险与机遇（五）

赛尔译文｜基础模型的风险与机遇（五）

哈工大SCIR

11+阅读 · 2021年11月30日

赛尔译文 | 基础模型的机遇与风险（三）

赛尔译文 | 基础模型的机遇与风险（三）

哈工大SCIR

12+阅读 · 2021年10月26日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

论文浅尝 | 知识图谱问答中的层次类型约束主题实体识别

论文浅尝 | 知识图谱问答中的层次类型约束主题实体识别

开放知识图谱

10+阅读 · 2018年5月14日

香港中大-商汤科技联合实验室AAAI录用论文详解：ST-GCN时空图卷积网络模型

香港中大-商汤科技联合实验室AAAI录用论文详解：ST-GCN时空图卷积网络模型

商汤科技

12+阅读 · 2018年2月11日

概率图模型体系：HMM、MEMM、CRF

概率图模型体系：HMM、MEMM、CRF

机器学习研究会

30+阅读 · 2018年2月10日

论文浅尝 | Improved Neural Relation Detection for KBQA

论文浅尝 | Improved Neural Relation Detection for KBQA

开放知识图谱

13+阅读 · 2018年1月21日

同步辐射方法原位研究光功能钪/钇基金属有机骨架材料的构效关系及荧光传感机理

国家自然科学基金

0+阅读 · 2016年12月31日

基于功能基元的晶态超分子材料的构筑与性能研究

国家自然科学基金

0+阅读 · 2016年12月31日

含耦合支链的多层多环空间机构的构型综合理论研究

国家自然科学基金

0+阅读 · 2015年12月31日

金属有机框架（MOFs）材料的构筑、纳米化及光学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

金属基纳米复合材料界面位错形核及冲击塑性机理的研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模轨迹数据的地理空间关联解译及分析挖掘研究

国家自然科学基金

1+阅读 · 2014年12月31日

硅纳米材料的高阶本构模型及其在尺寸效应和表面效应研究中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

互穿网络型离子液体修饰的高孔容金属-有机框架材料的构筑及捕集CO2机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

高性能锗基柔性电极材料的合成及其电荷转移机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

AlGaN/GaN MIS-HEMT器件在质子辐射下的退化机理，寿命预测模型与加固技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductors Discovery

Arxiv

0+阅读 · 4月29日

A Category-Theoretic Framework from Biological Mechanics to Engineered Stimulus-Response Systems

Arxiv

0+阅读 · 4月29日

Mass conservation analysis of extrusion-based 3D printing simulations based on the level-set method

Arxiv

0+阅读 · 4月28日

Generalizable Friction Coefficient Estimation via Material Embedding and Proxy Interaction Modeling

Arxiv

0+阅读 · 4月27日

In-context modeling as a retrain-free paradigm for foundation models in computational science

Arxiv

0+阅读 · 4月25日

Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

Arxiv

0+阅读 · 4月21日

Tabular foundation models for in-context prediction of molecular properties

Arxiv

0+阅读 · 4月17日

Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery

Arxiv

0+阅读 · 4月14日

Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems

Arxiv

0+阅读 · 4月13日

Modeling Tripartite Hyperevents in Scientific Collaboration Networks

Arxiv

0+阅读 · 4月12日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

7+阅读 · 5月5日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

3+阅读 · 5月5日

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

3+阅读 · 5月5日

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

4+阅读 · 5月5日

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

6+阅读 · 5月5日

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

12+阅读 · 5月5日

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

4+阅读 · 5月5日

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

6+阅读 · 5月5日

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

2+阅读 · 5月5日

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

2+阅读 · 5月5日

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

11+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

9+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

9+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

7+阅读 · 5月4日

相关VIP内容

【新书】基于物理的模拟

【新书】基于物理的模拟

专知会员服务

23+阅读 · 2025年7月25日

大模型及其在材料科学中的应用与展望

大模型及其在材料科学中的应用与展望

专知会员服务

49+阅读 · 2023年12月13日

上海交大姚振鹏副教授团队在《Nature Reviews Materials》发表人工智能加速材料发现综述论文

上海交大姚振鹏副教授团队在《Nature Reviews Materials》发表人工智能加速材料发现综述论文

专知会员服务

24+阅读 · 2022年10月31日

综述：基于进化和物理启发建模的计算蛋白设计

综述：基于进化和物理启发建模的计算蛋白设计

专知会员服务

16+阅读 · 2022年9月12日

【AI+新材料】MIT学者利用计算建模指导新材料的开发，Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal–Organic Frameworks

【AI+新材料】MIT学者利用计算建模指导新材料的开发，Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal–Organic Frameworks

专知会员服务

24+阅读 · 2022年3月13日

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

2022最新发表《绿色碳科学：双碳目标下的科学基础 ——第 292 期“双清论坛”学术综述》

2022最新发表《绿色碳科学：双碳目标下的科学基础 ——第 292 期“双清论坛”学术综述》

专知会员服务

18+阅读 · 2022年2月12日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【Nature交叉学科论文】机器学习在固体材料科学中的最新进展和应用，Recent advances and applications of machine learning in solidstate materials science

【Nature交叉学科论文】机器学习在固体材料科学中的最新进展和应用，Recent advances and applications of machine learning in solidstate materials science

专知会员服务

36+阅读 · 2019年12月21日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

赛尔译文｜基础模型的风险与机遇（五）

赛尔译文｜基础模型的风险与机遇（五）

哈工大SCIR

11+阅读 · 2021年11月30日

赛尔译文 | 基础模型的机遇与风险（三）

赛尔译文 | 基础模型的机遇与风险（三）

哈工大SCIR

12+阅读 · 2021年10月26日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

论文浅尝 | 知识图谱问答中的层次类型约束主题实体识别

论文浅尝 | 知识图谱问答中的层次类型约束主题实体识别

开放知识图谱

10+阅读 · 2018年5月14日

香港中大-商汤科技联合实验室AAAI录用论文详解：ST-GCN时空图卷积网络模型

香港中大-商汤科技联合实验室AAAI录用论文详解：ST-GCN时空图卷积网络模型

商汤科技

12+阅读 · 2018年2月11日

概率图模型体系：HMM、MEMM、CRF

概率图模型体系：HMM、MEMM、CRF

机器学习研究会

30+阅读 · 2018年2月10日

论文浅尝 | Improved Neural Relation Detection for KBQA

论文浅尝 | Improved Neural Relation Detection for KBQA

开放知识图谱

13+阅读 · 2018年1月21日

相关论文

Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductors Discovery

Arxiv

0+阅读 · 4月29日

A Category-Theoretic Framework from Biological Mechanics to Engineered Stimulus-Response Systems

Arxiv

0+阅读 · 4月29日

Mass conservation analysis of extrusion-based 3D printing simulations based on the level-set method

Arxiv

0+阅读 · 4月28日

Generalizable Friction Coefficient Estimation via Material Embedding and Proxy Interaction Modeling

Arxiv

0+阅读 · 4月27日

In-context modeling as a retrain-free paradigm for foundation models in computational science

Arxiv

0+阅读 · 4月25日

Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

Arxiv

0+阅读 · 4月21日

Tabular foundation models for in-context prediction of molecular properties

Arxiv

0+阅读 · 4月17日

Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery

Arxiv

0+阅读 · 4月14日

Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems

Arxiv

0+阅读 · 4月13日

Modeling Tripartite Hyperevents in Scientific Collaboration Networks

Arxiv

0+阅读 · 4月12日

相关基金

同步辐射方法原位研究光功能钪/钇基金属有机骨架材料的构效关系及荧光传感机理

国家自然科学基金

0+阅读 · 2016年12月31日

基于功能基元的晶态超分子材料的构筑与性能研究

国家自然科学基金

0+阅读 · 2016年12月31日

含耦合支链的多层多环空间机构的构型综合理论研究

国家自然科学基金

0+阅读 · 2015年12月31日

金属有机框架（MOFs）材料的构筑、纳米化及光学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

金属基纳米复合材料界面位错形核及冲击塑性机理的研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模轨迹数据的地理空间关联解译及分析挖掘研究

国家自然科学基金

1+阅读 · 2014年12月31日

硅纳米材料的高阶本构模型及其在尺寸效应和表面效应研究中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

互穿网络型离子液体修饰的高孔容金属-有机框架材料的构筑及捕集CO2机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

高性能锗基柔性电极材料的合成及其电荷转移机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

AlGaN/GaN MIS-HEMT器件在质子辐射下的退化机理，寿命预测模型与加固技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员