Intermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots' relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.
翻译:摘要:间歇性故障是随机出现和消失的瞬时错误。尽管间歇性故障对可靠性与协调性构成重大挑战,但现有机器人群体容错研究主要关注永久性故障。其原因之一是,在机器人群体典型的全自组织自组网中,间歇性故障极难检测——因为其网络拓扑具有瞬时性和不可预测性。然而,在近期提出的自组织神经系统(SoNS)方法中,机器人群体首次能够自组织形成持久性网络结构,从而缓解了间歇性故障的检测难题。针对具备持久性网络的机器人群体中的间歇性故障,我们提出了一种基于自组织备份层与多重网络分布式共识的新型主动-被动检测与缓解策略。在主动层面,机器人于故障发生前自组织动态备份路径,并适应主网络拓扑与机器人相对位置的变化;在被动层面,机器人采用单次似然比检验,比较多重网络中不同路径接收的信息,实现早期故障检测。检测到故障后,通信将临时以自组织方式重路由,直至故障消除。我们在编队控制中典型的位置数据错误场景下验证了该方法,结果表明:该策略能防止间歇性故障干扰编队收敛至目标队形,同时具有高故障检测准确率与低误报率。