Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models

The open-world test dataset is often mixed with out-of-distribution (OOD) samples, where the deployed models will struggle to make accurate predictions. Traditional detection methods need to trade off OOD detection and in-distribution (ID) classification performance since they share the same representation learning model. In this work, we propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs. Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection. Although it is conceptually simple, extending this vanilla framework to practical detection applications is still limited by two significant challenges. First, the popular similarity metrics based on Euclidian distance fail to consider the complex graph structure. Second, the generative model involving iterative denoising steps is time-consuming especially when it runs on the enormous pool of drugs. To address these challenges, our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations: i) An effective metric to comprehensively quantify the matching degree of input and reconstructed molecules; ii) A creative graph generator to construct prototypical graphs that are in line with ID but away from OOD; iii) An efficient and scalable OOD detector to compare the similarity between test samples and pre-constructed prototypical graphs and omit the generative process on every new molecule. Extensive experiments on ten benchmark datasets and six baselines are conducted to demonstrate our superiority.

翻译：开放世界测试数据集中常混杂分布外样本，导致部署模型难以做出准确预测。传统检测方法因共享同一表征学习模型，需在分布外检测与分布内分类性能之间权衡。本研究提出基于辅助扩散模型框架检测分子分布外样本，通过比较输入分子与重构图的相似性实现。由于生成模型对分布内训练样本存在重构偏置，分布外分子的相似度得分会显著降低，从而便于检测。尽管概念简单，将该基础框架扩展至实际检测应用仍面临两大挑战：其一，基于欧氏距离的常用相似度度量无法考虑复杂图结构；其二，涉及迭代去噪步骤的生成模型耗时严重，尤其在处理海量药物分子时更为突出。为攻克这些难题，本研究率先提出基于原型图重构的分子分布外检测方法（PGR-MOOD），其创新点包括：i) 提出能全面量化输入分子与重构分子匹配度的有效度量；ii) 设计创造性图生成器，构建符合分布内特征但偏离分布外特征的原型图；iii) 开发高效可扩展的分布外检测器，通过比对测试样本与预构建原型图的相似性，避免对每个新分子执行生成过程。在十个基准数据集和六条基线上的大量实验证明了本方法的优越性。