The integration of reasoning, learning, and decision-making is key to build more general artificial intelligence systems. As a step in this direction, we propose a novel neural-logic architecture, called differentiable logic machine (DLM), that can solve both inductive logic programming (ILP) and reinforcement learning (RL) problems, where the solution can be interpreted as a first-order logic program. Our proposition includes several innovations. Firstly, our architecture defines a restricted but expressive continuous relaxation of the space of first-order logic programs by assigning weights to predicates instead of rules, in contrast to most previous neural-logic approaches. Secondly, with this differentiable architecture, we propose several (supervised and RL) training procedures, based on gradient descent, which can recover a fully-interpretable solution (i.e., logic formula). Thirdly, to accelerate RL training, we also design a novel critic architecture that enables actor-critic algorithms. Fourthly, to solve hard problems, we propose an incremental training procedure that can learn a logic program progressively. Compared to state-of-the-art (SOTA) differentiable ILP methods, DLM successfully solves all the considered ILP problems with a higher percentage of successful seeds (up to 3.5$\times$). On RL problems, without requiring an interpretable solution, DLM outperforms other non-interpretable neural-logic RL approaches in terms of rewards (up to 3.9%). When enforcing interpretability, DLM can solve harder RL problems (e.g., Sorting, Path) Moreover, we show that deep logic programs can be learned via incremental supervised training. In addition to this excellent performance, DLM can scale well in terms of memory and computational time, especially during the testing phase where it can deal with much more constants ($>$2$\times$) than SOTA.
翻译:推理、学习与决策的整合是构建更通用人工智能系统的关键。为此,我们提出一种新型神经逻辑架构——可微逻辑机(DLM),该架构可同时解决归纳逻辑编程(ILP)与强化学习(RL)问题,且其解可解释为一阶逻辑程序。本研究包含多项创新:首先,区别于以往神经逻辑方法,我们的架构通过对谓词而非规则赋予权重,对一阶逻辑程序空间进行受限但具表达力的连续松弛;其次,基于该可微架构,我们提出多种基于梯度下降的(监督式与强化学习)训练流程,可恢复完全可解释的解(即逻辑公式);第三,为加速强化学习训练,我们设计了一种新型评论家架构以支持演员-评论家算法;第四,为解决疑难问题,我们提出增量式训练流程,可逐步学习逻辑程序。与最先进的可微ILP方法相比,DLM成功解决了所有待测ILP问题,且成功种子比例提升高达3.5倍。在强化学习问题上,无需可解释解时,DLM的奖励值超越其他不可解释的神经逻辑强化学习方法(最高提升3.9%);当强制可解释性时,DLM仍能解决更复杂的强化学习问题(如排序、路径问题)。此外,我们证明深度逻辑程序可通过增量式监督训练习得。除了卓越性能,DLM在内存与计算时间方面具有良好扩展性,尤其在测试阶段可处理比最先进方法多两倍以上的常量。