A Neuro-Symbolic Approach to Multi-Agent RL for Interpretability and Probabilistic Decision Making

Multi-agent reinforcement learning (MARL) is well-suited for runtime decision-making in optimizing the performance of systems where multiple agents coexist and compete for shared resources. However, applying common deep learning-based MARL solutions to real-world problems suffers from issues of interpretability, sample efficiency, partial observability, etc. To address these challenges, we present an event-driven formulation, where decision-making is handled by distributed co-operative MARL agents using neuro-symbolic methods. The recently introduced neuro-symbolic Logical Neural Networks (LNN) framework serves as a function approximator for the RL, to train a rules-based policy that is both logical and interpretable by construction. To enable decision-making under uncertainty and partial observability, we developed a novel probabilistic neuro-symbolic framework, Probabilistic Logical Neural Networks (PLNN), which combines the capabilities of logical reasoning with probabilistic graphical models. In PLNN, the upward/downward inference strategy, inherited from LNN, is coupled with belief bounds by setting the activation function for the logical operator associated with each neural network node to a probability-respecting generalization of the Fr\'echet inequalities. These PLNN nodes form the unifying element that combines probabilistic logic and Bayes Nets, permitting inference for variables with unobserved states. We demonstrate our contributions by addressing key MARL challenges for power sharing in a system-on-chip application.

翻译：多智能体强化学习（MARL）非常适合在多个智能体共存并竞争共享资源的系统中进行运行时决策优化。然而，将基于深度学习的常见MARL解决方案应用于实际问题时，会面临可解释性、样本效率、部分可观测性等挑战。为应对这些问题，本文提出一种事件驱动公式，其中决策由分布式协作MARL智能体通过神经符号方法处理。新近提出的神经符号逻辑神经网络（LNN）框架作为强化学习的函数逼近器，用于训练一种既具有逻辑性又天然可解释的基于规则的策略。为实现在不确定性和部分可观测条件下的决策，我们开发了一种新颖的概率神经符号框架——概率逻辑神经网络（PLNN），该框架将逻辑推理能力与概率图模型相结合。在PLNN中，继承自LNN的上行/下行推理策略与置信边界相结合，具体方法是将每个神经网络节点逻辑运算符的激活函数设置为弗雷歇不等式的概率尊重推广形式。这些PLNN节点构成了融合概率逻辑与贝叶斯网络的统一要素，允许对具有未观测状态的变量进行推理。我们通过解决片上系统应用中电力共享的关键MARL挑战来展示本文的贡献。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日