The geometric median of a tuple of vectors is the vector that minimizes the sum of Euclidean distances to the vectors of the tuple. Classically called the Fermat-Weber problem and applied to facility location, it has become a major component of the robust learning toolbox. It is typically used to aggregate the (processed) inputs of different data providers, whose motivations may diverge, especially in applications like content moderation. Interestingly, as a voting system, the geometric median has well-known desirable properties: it is a provably good average approximation, it is robust to a minority of malicious voters, and it satisfies the "one voter, one unit force" fairness principle. However, what was not known is the extent to which the geometric median is strategyproof. Namely, can a strategic voter significantly gain by misreporting their preferred vector? We prove in this paper that, perhaps surprisingly, the geometric median is not even $\alpha$-strategyproof, where $\alpha$ bounds what a voter can gain by deviating from truthfulness. But we also prove that, in the limit of a large number of voters with i.i.d. preferred vectors, the geometric median is asymptotically $\alpha$-strategyproof. We show how to compute this bound $\alpha$. We then generalize our results to voters who care more about some dimensions. Roughly, we show that, if some dimensions are more polarized and regarded as more important, then the geometric median becomes less strategyproof. Interestingly, we also show how the skewed geometric medians can improve strategyproofness. Nevertheless, if voters care differently about different dimensions, we prove that no skewed geometric median can achieve strategyproofness for all. Overall, our results constitute a coherent set of insights into the extent to which the geometric median is suitable to aggregate high-dimensional disagreements.
翻译:向量元组的几何中位数是使该元组中各向量欧氏距离之和最小化的向量。该问题古典上被称为费马-韦伯问题,并应用于设施选址,现已成为鲁棒学习工具箱的重要组件。它通常用于聚合不同数据提供者(其动机可能各异,尤其在内容审核等应用中)的(经处理的)输入。有趣的是,作为投票系统,几何中位数具有众所周知的优良性质:它是可证明的良好平均近似,对少数恶意投票者具有鲁棒性,并满足"一票一力"的公平原则。然而,几何中位数的策略证明程度尚不明确。具体而言,策略性投票者能否通过虚报其偏好向量获得显著收益?本文证明,或许令人惊讶的是,几何中位数甚至不满足$\alpha$-策略证明性,其中$\alpha$界定了投票者因偏离诚实行为所能获得的收益上限。但我们同时证明,在具有独立同分布偏好向量的大量投票者极限情况下,几何中位数渐近满足$\alpha$-策略证明性,并展示了该上界$\alpha$的计算方法。进一步将结论推广至对某些维度更为关注的投票者。大致而言,若某些维度更极化且被视为更重要,则几何中位数的策略证明性会降低。有趣的是,我们还展示了斜几何中位数如何提升策略证明性。然而,若投票者对不同维度的关注度存在差异,则证明不存在能实现全面策略证明性的斜几何中位数。总体而言,本文结果系统揭示了几何中位数适用于聚合高维分歧的程度。