We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.
翻译:我们将具有服务副本的系统中的入侵容忍问题建模为一个双层最优控制问题。在局部层面,节点控制器执行入侵恢复;在全局层面,系统控制器管理复制因子。局部与全局控制问题可分别转化为运筹学中的经典问题——即机器更换问题与库存补充问题。基于这一建模,我们设计了TOLERANCE——一种面向入侵容忍系统的新型控制架构。我们证明了两个层面的最优控制策略均具有阈值结构,并设计了高效算法进行计算。在运行10种网络入侵的仿真环境中,我们对TOLERANCE进行了实现与评估。结果表明,与现有最先进的入侵容忍系统相比,TOLERANCE能够提升服务可用性并降低运维成本。