In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design.
翻译:过去几十年,科学与工程应用中生成大规模数据的能力持续增长。同时,机器学习的发展使其成为处理和利用现有数据的有效工具。然而,许多重要的科学与工程问题仍面临挑战,现有机器学习方法尚无法高效利用现有数据和资源。例如,在科学发现中,我们常需探索规模极大且维度极高的空间,此时对高保真黑盒目标函数进行查询极其昂贵。发展能够高效应对此类问题的机器学习方法,将有助于加速药物发现和材料研发等关键领域。本文提出利用GFlowNets进行多保真主动学习,其中黑盒函数的多种近似模型以较低保真度和成本可用。GFlowNets是近期提出的摊销概率推断方法,已被证明在探索大规模高维空间方面具有高效性,因此同样适用于多保真场景。我们详细描述了基于GFlowNets的多保真主动学习算法,并在经典合成任务及分子发现等实际应用中评估其性能。实验结果表明,基于GFlowNets的多保真主动学习能够有效利用不同成本和保真度的多个评估器,加速科学发现与工程设计进程。