An important problem on social information sites is the recovery of ground truth from individual reports when the experts are in the minority. The wisdom of the crowd, i.e. the collective opinion of a group of individuals fails in such a scenario. However, the surprisingly popular (SP) algorithm~\cite{prelec2017solution} can recover the ground truth even when the experts are in the minority, by asking the individuals to report additional prediction reports--their beliefs about the reports of others. Several recent works have extended the surprisingly popular algorithm to an equivalent voting rule (SP-voting) to recover the ground truth ranking over a set of $m$ alternatives. However, we are yet to fully understand when SP-voting can recover the ground truth ranking, and if so, how many samples (votes and predictions) it needs. We answer this question by proposing two rank-order models and analyzing the sample complexity of SP-voting under these models. In particular, we propose concentric mixtures of Mallows and Plackett-Luce models with $G (\ge 2)$ groups. Our models generalize previously proposed concentric mixtures of Mallows models with $2$ groups, and we highlight the importance of $G > 2$ groups by identifying three distinct groups (expert, intermediate, and non-expert) from existing datasets. Next, we provide conditions on the parameters of the underlying models so that SP-voting can recover ground-truth rankings with high probability, and also derive sample complexities under the same. We complement the theoretical results by evaluating SP-voting on simulated and real datasets.
翻译:社交信息平台上一个重要问题是如何从个体报告中恢复真实情况,尤其是在专家处于少数派的情况下。此时,群体智慧——即个体意见的集体共识——往往无法奏效。然而,"意外流行"算法(SP算法)通过要求个体额外提供预测报告(即他们对他人报告的信念),即使在专家占少数的情况下也能恢复真实情况。近期若干研究将该算法扩展为等效的投票规则(SP投票),用于恢复对$m$个备选方案的真实排序。然而,我们尚未完全理解SP投票在何种条件下能够恢复真实排序,以及需要多少样本(投票与预测)才能实现。本文通过提出两种排序模型并分析SP投票在这些模型下的样本复杂度来回答这个问题。具体而言,我们提出了包含$G(\ge 2)$个群体的Mallows模型与Plackett-Luce模型的同心混合模型。我们的模型推广了先前提出的仅含$2$个群体的Mallows同心混合模型,并通过从现有数据集中识别出三个不同群体(专家、中间者和非专家),强调了$G > 2$个群体的重要性。随后,我们给出了基础模型参数所需满足的条件,以确保SP投票能够以高概率恢复真实排序,并推导出相应的样本复杂度。最后,我们通过在模拟数据集和真实数据集上评估SP投票来补充理论结果。