Existing works based on molecular knowledge neglect the 3D geometric structure of molecules and fail to learn the high-dimensional information of medications, leading to structural confusion. Additionally, it does not extract key substructures from a single patient visit, resulting in the failure to identify medication molecules suitable for the current patient visit. To address the above limitations, we propose a bimodal molecular recommendation framework named BiMoRec, which introduces 3D molecular structures to obtain atomic 3D coordinates and edge indices, overcoming the inherent lack of high-dimensional molecular information in 2D molecular structures. To retain the fast training and prediction efficiency of the recommendation system, we use bimodal graph contrastive pretraining to maximize the mutual information between the two molecular modalities, achieving the fusion of 2D and 3D molecular graphs. Additionally, we designed a molecular multi-step enhancement mechanism to re-calibrate the molecular weights. Specifically, we employ a pre-training method that captures both 2D and 3D molecular structure representations, along with substructure representations, and leverages contrastive learning to extract mutual information. We then use the pre-trained encoder to generate molecular representations, enhancing them through a three-step process: intra-visit, molecular per-visit, and latest-visit. Finally, we apply temporal information aggregation to generate the final medication combinations. Our implementation on the MIMIC-III and MIMIC-IV datasets demonstrates that our method achieves state-of-the-art performance.
翻译:现有基于分子知识的研究忽视了分子的三维几何结构,未能学习药物的高维信息,导致结构混淆。此外,这些方法未能从单次患者就诊中提取关键子结构,导致无法识别适合当前就诊的药物分子。为克服上述局限,我们提出了一种名为BiMoRec的双模态分子推荐框架,该框架引入三维分子结构以获取原子三维坐标与边索引,从而克服二维分子结构固有缺乏高维分子信息的问题。为保持推荐系统的快速训练与预测效率,我们采用双模态图对比预训练来最大化两种分子模态间的互信息,实现二维与三维分子图的融合。此外,我们设计了分子多步增强机制以重新校准分子权重。具体而言,我们采用一种能同时捕获二维与三维分子结构表征及子结构表征的预训练方法,并利用对比学习提取互信息。随后使用预训练编码器生成分子表征,通过三步增强流程——就诊内增强、分子单次就诊增强与最近就诊增强——对其进行优化。最后,我们应用时序信息聚合生成最终药物组合。在MIMIC-III与MIMIC-IV数据集上的实验表明,本方法取得了最先进的性能。