We investigate a relatively under-explored class of hybrid neurosymbolic models that integrate symbolic learning with neural reasoning to construct data generators meeting formal correctness criteria. In Symbolic Neural Generators (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances -- sometimes just one. Each specification in turn constrains the conditional information supplied to a neural-based generator, which rejects any instance violating the symbolic specification. Like other neurosymbolic approaches, SNG exploits the complementary strengths of symbolic and neural methods. The outcome of an SNG is a pair $(H, X)$, where $H$ is a symbolic description of feasible instances constructed from data, and $X$ a set of generated new instances that satisfy the description. We introduce a semantics for such systems, based on the construction of appropriate base and fibre partially-ordered sets combined into an overall partial order. We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design. Our main interest is the description and the set of potential inhibitor molecules generated by the SNG. On benchmark problems -- where drug targets are well understood -- SNG performance is statistically comparable to state-of-the-art methods. On exploratory problems with poorly understood targets, generated molecules exhibit binding affinities on par with leading clinical candidates. Experts further find the symbolic specifications useful as preliminary filters, with several generated molecules identified as viable for synthesis and wet-lab testing.
翻译:我们研究了一类相对未充分探索的混合神经符号模型,该模型将符号学习与神经推理相结合,以构建满足形式化正确性标准的数据生成器。在符号神经生成器(SNG)中,符号学习器从少量实例——有时仅一个——中检验可行数据的逻辑规格说明。每个规格说明进而约束提供给基于神经的生成器的条件信息,该生成器拒绝任何违反符号规格的实例。与其他神经符号方法类似,SNG利用了符号方法和神经方法的互补优势。SNG的输出是一个配对$(H, X)$,其中$H$是从数据构建的可行实例的符号描述,而$X$是一组满足该描述的新生成实例。我们基于构建适当的基偏序集与纤维偏序集并将其组合为整体偏序集,为此类系统引入了一种语义。我们实现了一个SNG,它结合了受限形式的归纳逻辑编程(ILP)与大型语言模型(LLM),并在早期药物设计中对其进行了评估。我们的主要关注点是符号描述以及SNG生成的潜在抑制剂分子集合。在基准问题(其中药物靶点已被充分理解)上,SNG的性能在统计上与最先进方法相当。在靶点理解较差的探索性问题上,生成的分子表现出与领先临床候选药物相当的结合亲和力。专家们进一步发现,符号规格说明作为初步过滤器是有用的,其中若干生成的分子被识别为适用于合成和湿实验室测试。