Surprisingly Popular Voting for Concentric Rank-Order Models

An important problem on social information sites is the recovery of ground truth from individual reports when the experts are in the minority. The wisdom of the crowd, i.e. the collective opinion of a group of individuals fails in such a scenario. However, the surprisingly popular (SP) algorithm~\cite{prelec2017solution} can recover the ground truth even when the experts are in the minority, by asking the individuals to report additional prediction reports--their beliefs about the reports of others. Several recent works have extended the surprisingly popular algorithm to an equivalent voting rule (SP-voting) to recover the ground truth ranking over a set of $m$ alternatives. However, we are yet to fully understand when SP-voting can recover the ground truth ranking, and if so, how many samples (votes and predictions) it needs. We answer this question by proposing two rank-order models and analyzing the sample complexity of SP-voting under these models. In particular, we propose concentric mixtures of Mallows and Plackett-Luce models with $G (\ge 2)$ groups. Our models generalize previously proposed concentric mixtures of Mallows models with $2$ groups, and we highlight the importance of $G > 2$ groups by identifying three distinct groups (expert, intermediate, and non-expert) from existing datasets. Next, we provide conditions on the parameters of the underlying models so that SP-voting can recover ground-truth rankings with high probability, and also derive sample complexities under the same. We complement the theoretical results by evaluating SP-voting on simulated and real datasets.

翻译：社交信息平台上一个重要问题是如何从个体报告中恢复真实情况，尤其是在专家处于少数派的情况下。此时，群体智慧——即个体意见的集体共识——往往无法奏效。然而，"意外流行"算法（SP算法）通过要求个体额外提供预测报告（即他们对他人报告的信念），即使在专家占少数的情况下也能恢复真实情况。近期若干研究将该算法扩展为等效的投票规则（SP投票），用于恢复对$m$个备选方案的真实排序。然而，我们尚未完全理解SP投票在何种条件下能够恢复真实排序，以及需要多少样本（投票与预测）才能实现。本文通过提出两种排序模型并分析SP投票在这些模型下的样本复杂度来回答这个问题。具体而言，我们提出了包含$G(\ge 2)$个群体的Mallows模型与Plackett-Luce模型的同心混合模型。我们的模型推广了先前提出的仅含$2$个群体的Mallows同心混合模型，并通过从现有数据集中识别出三个不同群体（专家、中间者和非专家），强调了$G > 2$个群体的重要性。随后，我们给出了基础模型参数所需满足的条件，以确保SP投票能够以高概率恢复真实排序，并推导出相应的样本复杂度。最后，我们通过在模拟数据集和真实数据集上评估SP投票来补充理论结果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日