We investigate a relatively underexplored class of hybrid neurosymbolic models integrating symbolic learning with neural reasoning to construct data generators meeting formal correctness criteria. In \textit{Symbolic Neural Generators} (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances -- sometimes just one. Each specification in turn constrains the conditional information supplied to a neural-based generator, which rejects any instance violating the symbolic specification. Like other neurosymbolic approaches, SNG exploits the complementary strengths of symbolic and neural methods. The outcome of an SNG is a triple $(H, X, W)$, where $H$ is a symbolic description of feasible instances constructed from data, $X$ a set of generated new instances that satisfy the description, and $W$ an associated weight. We introduce a semantics for such systems, based on the construction of appropriate \textit{base} and \textit{fibre} partially-ordered sets combined into an overall partial order, and outline a probabilistic extension relevant to practical applications. In this extension, SNGs result from searching over a weighted partial ordering. We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design. Our main interest is the description and the set of potential inhibitor molecules generated by the SNG. On benchmark problems -- where drug targets are well understood -- SNG performance is statistically comparable to state-of-the-art methods. On exploratory problems with poorly understood targets, generated molecules exhibit binding affinities on par with leading clinical candidates. Experts further find the symbolic specifications useful as preliminary filters, with several generated molecules identified as viable for synthesis and wet-lab testing.
翻译:本研究探讨了一类相对未被充分探索的混合神经符号模型,该模型将符号学习与神经推理相结合,以构建满足形式正确性标准的数据生成器。在\textit{符号神经生成器}(SNGs)中,符号学习器从少量实例(有时仅一个)中分析可行数据的逻辑规约。每个规约进而约束提供给基于神经网络的生成器的条件信息,该生成器会拒绝任何违反符号规约的实例。与其他神经符号方法类似,SNG利用了符号方法与神经方法的互补优势。SNG的输出是一个三元组$(H, X, W)$,其中$H$是从数据构建的可行实例的符号化描述,$X$是满足该描述的一组新生成实例,$W$是关联权重。我们为此类系统引入了一种语义,其基础在于将适当的\textit{基}与\textit{纤维}偏序集组合成一个整体偏序结构,并概述了与实际应用相关的概率扩展。在此扩展中,SNG是通过在加权偏序上进行搜索而得到的。我们实现了一个SNG,它将一种受限形式的归纳逻辑编程(ILP)与一个大语言模型(LLM)相结合,并在早期药物设计任务上对其进行了评估。我们的主要兴趣在于SNG生成的描述以及潜在抑制剂分子集合。在基准问题(药物靶点已得到充分理解)上,SNG的性能在统计上与最先进方法相当。在靶点理解不足的探索性问题中,生成的分子表现出与领先临床候选药物相当的结合亲和力。专家们进一步发现符号规约可作为有效的初步筛选工具,其中多个生成的分子被确定为适合进行合成与湿实验室测试的可行候选物。