In this paper, we present distributed fault-tolerant algorithms that approximate the centroid of a set of $n$ data points in $\mathbb{R}^d$. Our work falls into the broader area of approximate multidimensional Byzantine agreement. The standard approach used in existing algorithms is to agree on a vector inside the convex hull of all correct vectors. This strategy dismisses many possibly correct data points. As a result, the algorithm does not necessarily agree on a representative value. To find better convergence strategies for the algorithms, we use the novel concept of defining an approximation of the centroid in the presence of Byzantine adversaries. We show that the standard agreement algorithms do not allow us to compute a better approximation than $2d$ of the centroid in the synchronous case. We investigate the trade-off between the quality of the approximation, the resilience of the algorithm, and the validity of the solution in order to design better approximation algorithms. For the synchronous case, we show that it is possible to achieve an optimal approximation of the centroid with up to $t<n/(d+1)$ Byzantine data points. This approach however does not give any guarantee on the validity of the solution. Therefore, we develop a second approach that reaches a $2\sqrt{d}$-approximation of the centroid, while satisfying the standard validity condition for agreement protocols. We are even able to restrict the validity condition to agreement inside the box of correct data points, while achieving optimal resilience of $t< n/3$. For the asynchronous case, we can adapt all three algorithms to reach the same approximation results (up to a constant factor). Our results suggest that it is reasonable to study the trade-off between validity conditions and the quality of the solution.
翻译:本文提出了分布式容错算法,用于逼近 $\mathbb{R}^d$ 空间中 $n$ 个数据点的质心。我们的工作属于多维近似拜占庭一致问题的广义范畴。现有算法的标准方法是:在正确向量构成的凸包内达成一个向量共识。然而,这种策略会忽略许多可能正确的数据点,导致算法未必能达成具有代表性的共识值。为寻求更好的算法收敛策略,我们引入了一个新概念:在存在拜占庭敌手的情况下定义质心近似。我们证明,在同步场景下,标准一致算法无法实现优于 $2d$ 的质心近似。我们研究了近似质量、算法容错性与解的合理性之间的权衡关系,以设计更优的近似算法。对于同步场景,我们证明当拜占庭数据点数 $t < n/(d+1)$ 时,可实现质心的最优近似。但该方法无法保证解的合理性。为此,我们提出了第二种方法,在满足一致协议标准有效性条件的同时,可实现 $2\sqrt{d}$ 的质心近似。我们甚至能将有效性条件限制为:在正确数据点构成的轴对齐超矩形内达成一致,同时实现 $t < n/3$ 的最优容错性。对于异步场景,我们可调整所有三种算法以达成相同的近似结果(至多相差一个常数因子)。研究结果表明,探讨有效性条件与解质量之间的权衡具有重要价值。