Rapid discovery of new reactions and molecules in recent years has been facilitated by the advancements in high throughput screening, accessibility to a much more complex chemical design space, and the development of accurate molecular modeling frameworks. A holistic study of the growing chemistry literature is, therefore, required that focuses on understanding the recent trends and extrapolating them into possible future trajectories. To this end, several network theory-based studies have been reported that use a directed graph representation of chemical reactions. Here, we perform a study based on representing chemical reactions as hypergraphs where the hyperedges represent chemical reactions and nodes represent the participating molecules. We use a standard reactions dataset to construct a hypernetwork and report its statistics such as degree distributions, average path length, assortativity or degree correlations, PageRank centrality, and graph-based clusters (or communities). We also compute each statistic for an equivalent directed graph representation of reactions to draw parallels and highlight differences between the two. To demonstrate the AI applicability of hypergraph reaction representation, we generate dense hypergraph embeddings and use them in the reaction classification problem. We conclude that the hypernetwork representation is flexible, preserves reaction context, and uncovers hidden insights that are otherwise not apparent in a traditional directed graph representation of chemical reactions.
翻译:近年来,高通量筛选技术的进步、更复杂化学设计空间的可及性以及精确分子建模框架的发展,促进了新反应和分子的快速发现。因此,需要对不断增长的化学文献进行整体性研究,重点理解近期趋势并将其外推至可能的未来方向。为此,已有若干基于网络理论的研究采用有向图表示化学反应。本文基于将化学反应表示为超图的方法开展研究,其中超边代表化学反应,节点代表参与反应的分子。我们使用标准反应数据集构建超网络,并报告其统计特征,包括度分布、平均路径长度、同配性(度相关性)、PageRank中心性以及基于图的聚类(社区)。同时,我们对等效的有向图反应表征计算相同统计量,以对比两种方法的异同。为展示超图反应表征在人工智能中的适用性,我们生成稠密超图嵌入,并将其应用于反应分类问题。最终得出结论:超网络表征具有灵活性,能够保留反应上下文信息,并揭示传统有向图化学反应表征中难以发现的隐藏规律。