New Bounds on the Accuracy of Majority Voting for Multi-Class Classification

Majority voting is a simple mathematical function that returns the value that appears most often in a set. As a popular decision fusion technique, the majority voting function (MVF) finds applications in resolving conflicts, where a number of independent voters report their opinions on a classification problem. Despite its importance and its various applications in ensemble learning, data crowd-sourcing, remote sensing, and data oracles for blockchains, the accuracy of the MVF for the general multi-class classification problem has remained unknown. In this paper, we derive a new upper bound on the accuracy of the MVF for the multi-class classification problem. More specifically, we show that under certain conditions, the error rate of the MVF exponentially decays toward zero as the number of independent voters increases. Conversely, the error rate of the MVF exponentially grows with the number of independent voters if these conditions are not met. We first explore the problem for independent and identically distributed voters where we assume that every voter follows the same conditional probability distribution of voting for different classes, given the true classification of the data point. Next, we extend our results for the case where the voters are independent but non-identically distributed. Using the derived results, we then provide a discussion on the accuracy of the truth discovery algorithms. We show that in the best-case scenarios, truth discovery algorithms operate as an amplified MVF and thereby achieve a small error rate only when the MVF achieves a small error rate, and vice versa, achieve a large error rate when the MVF also achieves a large error rate. In the worst-case scenario, the truth discovery algorithms may achieve a higher error rate than the MVF. Finally, we confirm our theoretical results using numerical simulations.

翻译：多数投票是一种简单的数学函数，它返回集合中出现频率最高的值。作为一种流行的决策融合技术，多数投票函数（MVF）在解决冲突中有着广泛应用，其中多个独立投票者就分类问题报告其意见。尽管MVF在集成学习、数据众包、遥感以及区块链数据预言机等领域具有重要性和广泛应用，但其在一般多类分类问题中的准确性此前一直未知。本文推导了MVF在多类分类问题中准确性的新上界。具体而言，我们证明：在特定条件下，MVF的误差率随独立投票者数量增加呈指数级衰减至零；反之，若这些条件不满足，误差率则随独立投票者数量增加呈指数级增长。我们首先探讨独立同分布投票者情形，假设每个投票者在给定数据点真实分类的情况下，对不同类别的投票遵循相同的条件概率分布。随后，我们将结果扩展到投票者独立但非同分布的情形。基于所得结论，我们进一步讨论了真相发现算法的准确性。研究表明：在最佳情况下，真相发现算法表现为增强型MVF，因此仅在MVF实现低误差率时才能获得低误差率，反之亦然——当MVF误差率较高时，其误差率也较高；在最坏情况下，真相发现算法可能达到比MVF更高的误差率。最后，我们通过数值仿真验证了理论结果。