Approximate Byzantine Fault-Tolerance in Distributed Optimization

from arxiv, 43 pages, 5 figures, and 1 table. The report is an important extension to prior work https://dl.acm.org/doi/abs/10.1145/3382734.3405748, and arXiv:2003.09675; Added an alternative result with a better analysis

This paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function, and in the fault-free case, the goal is to design a distributed algorithm that allows all the agents to find a minimum point of all the agents' aggregate cost function. We consider a scenario where some agents might be Byzantine faulty that renders the original goal of computing a minimum point of all the agents' aggregate cost vacuous. A more reasonable objective for an algorithm in this scenario is to allow all the non-faulty agents to compute the minimum point of only the non-faulty agents' aggregate cost. Prior work shows that if there are up to $f$ (out of $n$) Byzantine agents then a minimum point of the non-faulty agents' aggregate cost can be computed exactly if and only if the non-faulty agents' costs satisfy a certain redundancy property called $2f$-redundancy. However, $2f$-redundancy is an ideal property that can be satisfied only in systems free from noise or uncertainties, which can make the goal of exact fault-tolerance unachievable in some applications. Thus, we introduce the notion of $(f,\epsilon)$-resilience, a generalization of exact fault-tolerance wherein the objective is to find an approximate minimum point of the non-faulty aggregate cost, with $\epsilon$ accuracy. This approximate fault-tolerance can be achieved under a weaker condition that is easier to satisfy in practice, compared to $2f$-redundancy. We obtain necessary and sufficient conditions for achieving $(f,\epsilon)$-resilience characterizing the correlation between relaxation in redundancy and approximation in resilience. In case when the agents' cost functions are differentiable, we obtain conditions for $(f,\epsilon)$-resilience of the distributed gradient-descent method when equipped with robust gradient aggregation.

翻译：本文研究了分布式多智能体优化中的拜占庭容错问题。在该问题中，每个智能体拥有一个局部代价函数，在无故障情况下，目标在于设计一种分布式算法，使所有智能体能够找到全体智能体聚合代价函数的最小点。我们考虑一种场景，其中部分智能体可能发生拜占庭故障，这将导致计算全体智能体聚合代价函数最小点的原始目标失去意义。在此场景下，算法的一个更合理目标是使所有非故障智能体仅计算非故障智能体聚合代价函数的最小点。先前研究表明，若存在多达$f$个（共$n$个）拜占庭智能体，则当且仅当非故障智能体的代价函数满足一种称为$2f$冗余的特定性质时，才能精确计算非故障智能体聚合代价函数的最小点。然而，$2f$冗余是一种理想性质，仅能在无噪声或不确定性的系统中成立，这可能使精确容错目标在某些应用中无法实现。为此，我们引入了$(f,\epsilon)$-鲁棒性的概念，这是精确容错的一种推广，其目标在于以$\epsilon$精度找到非故障聚合代价函数的近似最小点。与$2f$冗余相比，这种近似容错可在更宽松的条件下实现，且在实际中更易满足。我们得到了实现$(f,\epsilon)$-鲁棒性的充要条件，刻画了冗余松弛与鲁棒性近似之间的相关性。当智能体的代价函数可微时，我们进一步得到了配备鲁棒梯度聚合的分布式梯度下降法实现$(f,\epsilon)$-鲁棒性的条件。