LTAU-FF: Loss Trajectory Analysis for Uncertainty in Atomistic Force Fields

Model ensembles are simple and effective tools for estimating the prediction uncertainty of deep learning atomistic force fields. Despite this, widespread adoption of ensemble-based uncertainty quantification (UQ) techniques is limited by the high computational costs incurred by ensembles during both training and inference. In this work we leverage the cumulative distribution functions (CDFs) of per-sample errors obtained over the course of training to efficiently represent the model ensemble, and couple them with a distance-based similarity search in the model latent space. Using these tools, we develop a simple UQ metric (which we call LTAU) that leverages the strengths of ensemble-based techniques without requiring the evaluation of multiple models during either training or inference. As an initial test, we apply our method towards estimating the epistemic uncertainty in atomistic force fields (LTAU-FF) and demonstrate that it can be easily calibrated to accurately predict test errors on multiple datasets from the literature. We then illustrate the utility of LTAU-FF in two practical applications: 1) tuning the training-validation gap for an example dataset, and 2) predicting errors in relaxation trajectories on the OC20 IS2RS task. Though in this work we focus on the use of LTAU with deep learning atomistic force fields, we emphasize that it can be readily applied to any regression task, or any ensemble-generation technique, to provide a reliable and easy-to-implement UQ metric.

翻译：模型集成是估计深度学习原子力场预测不确定性的简单而有效的工具。然而，基于集成的不确定性量化技术因其在训练和推理过程中产生的高计算成本而限制了其广泛应用。本文利用训练过程中获得的每个样本误差的累积分布函数高效地表示模型集成，并将其与模型潜在空间中的基于距离的相似性搜索相结合。基于这些工具，我们提出了一种简单的不确定性量化指标（称为LTAU），该指标利用了集成技术的优势，而无需在训练或推理过程中评估多个模型。作为初步测试，我们将该方法应用于估计原子力场的认知不确定性（LTAU-FF），并证明其可以轻松校准以准确预测文献中多个数据集上的测试误差。随后，我们展示了LTAU-FF在两个实际应用中的实用性：1) 针对示例数据集调整训练-验证差距，以及2) 预测OC20 IS2RS任务中弛豫轨迹的误差。尽管本文聚焦于LTAU在深度学习原子力场中的应用，但需强调，该方法可便捷地应用于任何回归任务或任何集成生成技术，以提供可靠且易于实现的不确定性量化指标。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

48+阅读 · 2019年12月19日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日