We analyze a general problem in a crowd-sourced setting where one user asks a question (also called item) and other users return answers (also called labels) for this question. Different from existing crowd sourcing work which focuses on finding the most appropriate label for the question (the "truth"), our problem is to determine a ranking of the users based on their ability to answer questions. We call this problem "ability discovery" to emphasize the connection to and duality with the more well-studied problem of "truth discovery". To model items and their labels in a principled way, we draw upon Item Response Theory (IRT) which is the widely accepted theory behind standardized tests such as SAT and GRE. We start from an idealized setting where the relative performance of users is consistent across items and better users choose better fitting labels for each item. We posit that a principled algorithmic solution to our more general problem should solve this ideal setting correctly and observe that the response matrices in this setting obey the Consecutive Ones Property (C1P). While C1P is well understood algorithmically with various discrete algorithms, we devise a novel variant of the HITS algorithm which we call "HITSNDIFFS" (or HND), and prove that it can recover the ideal C1P-permutation in case it exists. Unlike fast combinatorial algorithms for finding the consecutive ones permutation (if it exists), HND also returns an ordering when such a permutation does not exist. Thus it provides a principled heuristic for our problem that is guaranteed to return the correct answer in the ideal setting. Our experiments show that HND produces user rankings with robustly high accuracy compared to state-of-the-art truth discovery methods. We also show that our novel variant of HITS scales better in the number of users than ABH, the only prior spectral C1P reconstruction algorithm.
翻译:我们分析了众包场景中的一个普遍问题:用户提出问题(也称为项目),其他用户对该问题返回答案(也称为标签)。与现有众包研究聚焦于为问题寻找最合适的标签(“真值”)不同,我们的问题在于根据用户回答问题能力对用户进行排序。我们将此问题称为“能力发现”,以强调其与更深入研究的问题“真值发现”之间的关联与对偶性。为以系统化方式建模项目及其标签,我们借鉴了项目反应理论(IRT),该理论是SAT和GRE等标准化测试广泛接受的理论基础。我们从理想化场景出发:用户间的相对表现在各项目上保持一致,且能力更强的用户能为每个项目选择更贴合的标签。我们假设解决该更一般问题的系统性算法方案应能正确求解此理想场景,并观察到该场景下的响应矩阵满足连续1性质(C1P)。尽管C1P算法已有多种离散算法方案,我们设计了一种新的HITS算法变体,称为“HITSNDIFFS”(简称HND),并证明其能在存在C1P排列时恢复该理想排列。与用于寻找连续1排列(若存在)的快速组合算法不同,HND在不存在该排列时仍能返回排序结果。因此,它为我们的问题提供了一种系统化启发式方法,且保证在理想场景中返回正确答案。实验表明,与最先进的真值发现方法相比,HND生成的用户排序具有鲁棒的高准确性。此外,我们还证明,相较于此前唯一的谱C1P重建算法ABH,我们的HITS变体在用户数量扩展性上更优。