A Decentralized Spike-based Learning Framework for Sequential Capture in Discrete Perimeter Defense Problem

This paper proposes a novel Decentralized Spike-based Learning (DSL) framework for the discrete Perimeter Defense Problem (d-PDP). A team of defenders is operating on the perimeter to protect the circular territory from radially incoming intruders. At first, the d-PDP is formulated as a spatio-temporal multi-task assignment problem (STMTA). The problem of STMTA is then converted into a multi-label learning problem to obtain labels of segments that defenders have to visit in order to protect the perimeter. The DSL framework uses a Multi-Label Classifier using Synaptic Efficacy Function spiking neuRON (MLC-SEFRON) network for deterministic multi-label learning. Each defender contains a single MLC-SEFRON network. Each MLC-SEFRON network is trained independently using input from its own perspective for decentralized operations. The input spikes to the MLC-SEFRON network can be directly obtained from the spatio-temporal information of defenders and intruders without any extra pre-processing step. The output of MLC-SEFRON contains the labels of segments that a defender has to visit in order to protect the perimeter. Based on the multi-label output from the MLC-SEFRON a trajectory is generated for a defender using a Consensus-Based Bundle Algorithm (CBBA) in order to capture the intruders. The target multi-label output for training MLC-SEFRON is obtained from an expert policy. Also, the MLC-SEFRON trained for a defender can be directly used for obtaining labels of segments assigned to another defender without any retraining. The performance of MLC-SEFRON has been evaluated for full observation and partial observation scenarios of the defender. The overall performance of the DSL framework is then compared with expert policy along with other existing learning algorithms. The scalability of the DSL has been evaluated using an increasing number of defenders.

翻译：本文提出了一种新颖的基于脉冲的去中心化学习（DSL）框架，用于解决离散周界防御问题（d-PDP）。一组防御者在周界上运行，以保护圆形领土免受沿径向入侵的敌人。首先，将d-PDP建模为时空多任务分配问题（STMTA）。随后将STMTA转换为多标签学习问题，以获得防御者必须访问的线段标签，从而保护周界。DSL框架采用基于突触效能函数的脉冲神经元多标签分类器网络（MLC-SEFRON）进行确定性多标签学习。每个防御者包含一个独立的MLC-SEFRON网络，每个网络均基于自身视角的输入进行独立训练，以实现去中心化操作。MLC-SEFRON网络的输入脉冲可直接从防御者和入侵者的时空信息中获取，无需额外预处理步骤。MLC-SEFRON的输出包含防御者为保护周界而需访问的线段标签。基于MLC-SEFRON的多标签输出，利用共识捆绑算法（CBBA）为防御者生成捕获入侵者的轨迹。训练MLC-SEFRON的目标多标签输出由专家策略提供。此外，为某防御者训练的MLC-SEFRON可直接用于获取分配给另一防御者的线段标签，无需重新训练。本文在防御者全观测与部分观测场景下评估了MLC-SEFRON的性能，进而将DSL框架的整体性能与专家策略及其他现有学习算法进行对比。通过增加防御者数量，验证了DSL框架的可扩展性。