Many modern software systems are built as a set of autonomous software components (also called agents) that collaborate with each other and are situated in an environment. To keep these multiagent systems operational under abnormal circumstances, it is crucial to make them resilient. Existing solutions are often centralised and rely on information manually provided by experts at design time, making such solutions rigid and limiting the autonomy and adaptability of the system. In this work, we propose a cooperative strategy focused on the identification of the root causes of quality requirement violations in multiagent systems. This strategy allows agents to cooperate with each other in order to identify whether these violations come from service providers, associated components, or the communication infrastructure. From this identification process, agents are able to adapt their behaviour in order to mitigate and solve existing abnormalities with the aim of normalising system operation. This strategy consists of an interaction protocol that, together with the proposed algorithms, allow agents playing the protocol roles to diagnose problems to be repaired. We evaluate our proposal with the implementation of a service-oriented system. The results demonstrate that our solution enables the correct identification of different sources of failures, favouring the selection of the most suitable actions to be taken to overcome abnormal situations.
翻译:诸多现代软件系统由一组自主软件组件(亦称智能体)构成,这些组件相互协作并部署于特定环境中。为使此类多智能体系统在异常情境下持续运行,增强其韧性至关重要。现有解决方案多采用集中式架构,且依赖设计阶段由专家手动提供的信息,导致此类方案固化,限制了系统的自主性与适应性。本研究提出一种聚焦于多智能体系统中质量需求违规根因识别的协作策略。该策略使智能体能够相互协作,以判定违规行为源自服务提供者、关联组件或是通信基础设施。通过这一识别过程,智能体可调整自身行为,从而缓解并修复现有异常,最终实现系统运行正常化。该策略包含一组交互协议及配套算法,使扮演协议角色的智能体能够诊断待修复问题。我们通过面向服务系统的实现验证了所提方案,结果表明:该方法能准确识别不同故障源,从而助力选择最适配的应对措施以克服异常场景。