In this paper, we present distributed fault-tolerant algorithms that approximate the centroid of a set of n data points in $\mathbb{R}^d$. Our work falls into the broader area of approximate multidimensional Byzantine agreement. The standard approach used in existing algorithms is to agree on a vector inside the convex hull of all correct vectors. This strategy dismisses many possibly correct data points. As a result, the algorithm does not necessarily agree on a representative value. In fact, this does not allow us to compute a better approximation than $2d$ of the centroid in the synchronous case. To find better approximation algorithms for the centroid, we investigate the trade-off between the quality of the approximation, the resilience of the algorithm, and the validity of the solution. For the synchronous case, we show that it is possible to achieve a $1$-approximation of the centroid with up to $t<n/(d+1)$ Byzantine data points. This approach however does not give any guarantee on the validity of the solution. Therefore, we develop a second approach that reaches a $2\sqrt{d}$-approximation of the centroid, while satisfying the standard validity condition for agreement protocols. We are even able to restrict the validity condition to agreement inside the box of correct data points, while achieving optimal resilience of $t< n/3$. For the asynchronous case, we can adapt all three algorithms to reach the same approximation results (up to a constant factor). Our results suggest that it is reasonable to study the trade-off between validity conditions and the quality of the solution.
翻译:本文提出分布式容错算法,用于近似计算 $\mathbb{R}^d$ 空间中 $n$ 个数据点的质心。本研究属于多维近似拜占庭一致性的广义领域。现有算法的标准方法是协商一个位于所有正确向量凸包内的向量,但这种策略排除了许多可能正确的数据点,导致算法无法协商出具有代表性的值。实际上,在同步情况下,这种方法无法实现比 $2d$ 更好的质心近似。为寻找更优的质心近似算法,我们研究了近似质量、算法容错性与解的有效性之间的权衡关系。针对同步情况,我们证明当拜占庭数据点数量满足 $t<n/(d+1)$ 时,可实现质心的 $1$-近似。但该方法无法保证解的有效性。为此,我们开发了第二种方法,在满足一致性协议标准有效性条件的前提下,实现质心的 $2\sqrt{d}$-近似。我们甚至能将有效性条件限制在正确数据点构成的超矩形内,同时保持 $t<n/3$ 的最优容错性。针对异步情况,可将三种算法全部适配为达到相同近似结果(至多相差常数因子)。研究结果表明,研究有效性条件与解质量之间的权衡具有合理性。