Symbolic Neural Generation with Applications to Lead Discovery in Drug Design

from arxiv, 37 pages, submitted to the Machine Learning journal; partial overlap of experimental results with https://doi.org/10.1101/2025.02.14.634875

We investigate a relatively under-explored class of hybrid neurosymbolic models that integrate symbolic learning with neural reasoning to construct data generators meeting formal correctness criteria. In Symbolic Neural Generators (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances -- sometimes just one. Each specification in turn constrains the conditional information supplied to a neural-based generator, which rejects any instance violating the symbolic specification. Like other neurosymbolic approaches, SNG exploits the complementary strengths of symbolic and neural methods. The outcome of an SNG is a pair $(H, X)$, where $H$ is a symbolic description of feasible instances constructed from data, and $X$ a set of generated new instances that satisfy the description. We introduce a semantics for such systems, based on the construction of appropriate base and fibre partially-ordered sets combined into an overall partial order. We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design. Our main interest is the description and the set of potential inhibitor molecules generated by the SNG. On benchmark problems -- where drug targets are well understood -- SNG performance is statistically comparable to state-of-the-art methods. On exploratory problems with poorly understood targets, generated molecules exhibit binding affinities on par with leading clinical candidates. Experts further find the symbolic specifications useful as preliminary filters, with several generated molecules identified as viable for synthesis and wet-lab testing.

翻译：我们研究了一类相对未充分探索的混合神经符号模型，该模型将符号学习与神经推理相结合，以构建满足形式化正确性标准的数据生成器。在符号神经生成器（SNG）中，符号学习器从少量实例——有时仅一个——中检验可行数据的逻辑规格说明。每个规格说明进而约束提供给基于神经的生成器的条件信息，该生成器拒绝任何违反符号规格的实例。与其他神经符号方法类似，SNG利用了符号方法和神经方法的互补优势。SNG的输出是一个配对$(H, X)$，其中$H$是从数据构建的可行实例的符号描述，而$X$是一组满足该描述的新生成实例。我们基于构建适当的基偏序集与纤维偏序集并将其组合为整体偏序集，为此类系统引入了一种语义。我们实现了一个SNG，它结合了受限形式的归纳逻辑编程（ILP）与大型语言模型（LLM），并在早期药物设计中对其进行了评估。我们的主要关注点是符号描述以及SNG生成的潜在抑制剂分子集合。在基准问题（其中药物靶点已被充分理解）上，SNG的性能在统计上与最先进方法相当。在靶点理解较差的探索性问题上，生成的分子表现出与领先临床候选药物相当的结合亲和力。专家们进一步发现，符号规格说明作为初步过滤器是有用的，其中若干生成的分子被识别为适用于合成和湿实验室测试。

相关内容

生成器

关注 2

生成器是一次生成一个值的特殊类型函数。可以将其视为可恢复函数。调用该函数将返回一个可用于生成连续 x 值的生成【Generator】，简单的说就是在函数的执行过程中，yield语句会把你需要的值返回给调用生成器的地方，然后退出函数，下一次调用生成器函数的时候又从上次中断的地方开始执行，而生成器内的所有变量参数都会被保存下来供下一次使用。

神经符号人工智能：黑盒模型时代下以任务为导向的综述

专知会员服务

12+阅读 · 3月4日

现代人工智能辅助药物发现中的图神经网络

专知会员服务

16+阅读 · 2025年6月10日

神经符号人工智能军事应用

专知会员服务

36+阅读 · 2024年8月23日

《面向军事应用的神经符号人工智能》

专知会员服务

34+阅读 · 2024年8月22日