Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electrochemical reactions. Machine learning (ML) holds the potential to efficiently model materials properties from large amounts of data, accelerating electrocatalyst design. The Open Catalyst Project OC20 dataset was constructed to that end. However, ML models trained on OC20 are still neither scalable nor accurate enough for practical applications. In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions, referred to as PhAST, and evaluate them thoroughly on multiple architectures. Overall, PhAST improves energy MAE by 4 to 42$\%$ while dividing compute time by 3 to 8$\times$ depending on the targeted task/model. PhAST also enables CPU training, leading to 40$\times$ speedups in highly parallelized settings. Python package: \url{https://phast.readthedocs.io}.
翻译:缓解气候危机需要快速向低碳能源转型。催化剂材料在众多关键工业过程的电化学反应中扮演核心角色,例如可再生能源存储和电燃料合成。为减少此类活动的能耗,我们必须快速发现更高效的催化剂以驱动电化学反应。机器学习(ML)有潜力从大量数据中高效建模材料性质,从而加速电催化剂设计。Open Catalyst项目构建了OC20数据集以实现这一目标。然而,基于OC20训练的ML模型在实际应用中仍缺乏可扩展性和足够精度。本文提出适用于大多数架构的任务特定创新方法,同时提升计算效率与准确性。这包括以下改进:(1)图构建步骤,(2)原子表示,(3)能量预测头,以及(4)力预测头。我们将这些贡献统称为PhAST,并在多种架构上进行了详尽评估。总体而言,PhAST将能量平均绝对误差(MAE)降低了4%至42%,同时根据目标任务/模型的不同,将计算时间缩短了3至8倍。PhAST还支持CPU训练,在高并行化设置中可实现40倍加速。Python包:\url{https://phast.readthedocs.io}。