In crowdsourcing, quality control is commonly achieved by having workers examine items and vote on their correctness. To minimize the impact of unreliable worker responses, a $\delta$-margin voting process is utilized, where additional votes are solicited until a predetermined threshold $\delta$ for agreement between workers is exceeded. The process is widely adopted but only as a heuristic. Our research presents a modeling approach using absorbing Markov chains to analyze the characteristics of this voting process that matter in crowdsourced processes. We provide closed-form equations for the quality of resulting consensus vote, the expected number of votes required for consensus, the variance of vote requirements, and other distribution moments. Our findings demonstrate how the threshold $\delta$ can be adjusted to achieve quality equivalence across voting processes that employ workers with varying accuracy levels. We also provide efficiency-equalizing payment rates for voting processes with different expected response accuracy levels. Additionally, our model considers items with varying degrees of difficulty and uncertainty about the difficulty of each example. Our simulations, using real-world crowdsourced vote data, validate the effectiveness of our theoretical model in characterizing the consensus aggregation process. The results of our study can be effectively employed in practical crowdsourcing applications.
翻译:在众包系统中,质量控制通常通过让工人检查项并对其正确性进行投票来实现。为降低不可靠工人响应的影响,采用了δ-边界投票过程,即持续征集额外投票直至投票一致性超过预设阈值δ。该过程虽被广泛采用,却仅作为启发式方法使用。本研究提出了一种基于吸收马尔可夫链的建模方法,用于分析该投票过程中影响众包流程的关键特性。我们推导了共识投票质量、达成共识所需预期投票数、投票数方差及其他分布矩的闭式表达式。研究结果表明,可通过调整阈值δ实现不同准确率工人群体在投票过程中的质量等效。同时,我们为具有不同预期响应准确率的投票过程提供了效率均衡的支付率。此外,模型还考虑了项的不同难度层级及对示例难度的不确定性问题。基于真实众包投票数据的仿真实验验证了理论模型在表征共识聚合过程中的有效性。本研究结果可有效应用于实际众包场景。