Diverse Committees with Incomplete or Inaccurate Approval Ballots

We study diversity in approval-based committee elections with incomplete or inaccurate information. We define diversity according to the Maximum Coverage problem, which is known to be $\mathsf{NP}$-complete, with a best attainable polynomial time approximation ratio of $1-1/e$. In the incomplete information setting, voters vote only on a small portion of the candidates, and we prove that getting arbitrarily close to the optimal approximation ratio w.h.p. requires $Ω(m^2)$ non-adaptive queries, where $m$ is the number of candidates. This motivates studying adaptive querying algorithms, that can adapt their querying strategy to information obtained from previous query outcomes. In that setting, we lower this bound to only $Ω(m)$ queries. We propose a greedy algorithm to match this lower bound up to log-factors. We prove the same $\tildeΘ(m)$ bound for the generalized problem of Maximum Coverage over a matroid constraint, using a local search algorithm. Specifying a matroid of valid committees lets us implement extra structural requirements on the committee, like quota. In the inaccurate information setting, voters' responses are corrupted with a small probability. We prove $\tildeΘ(nm)$ queries are required to attain a $(1-1/e)$-approximation with high probability, where $n$ is the number of voters. While the proven bounds show that all our algorithms are viable asymptotically, they also show that some of them would still require large numbers of queries in instances of practical relevance. Using real data from Polis as well as synthetic data, we observe that our algorithms perform well also on smaller instances, both with incomplete and inaccurate information.

翻译：我们研究了在信息不完整或不准确的情况下基于批准的委员会选举中的多样性。我们根据最大覆盖问题定义多样性，该问题已知是$\mathsf{NP}$-完全的，且最佳可实现的多项式时间近似比为$1-1/e$。在信息不完整的环境中，选民仅对少数候选人进行投票，我们证明，要任意接近最优近似比（高概率），需要$\Omega(m^2)$次非自适应查询，其中$m$是候选人数。这促使我们研究自适应查询算法，该算法能根据先前查询结果调整查询策略。在此设置下，我们将下界降至仅$\Omega(m)$次查询。我们提出一种贪心算法，以匹配该下界（在对数因子内）。通过局部搜索算法，我们针对拟阵约束上的广义最大覆盖问题证明了相同的$\tilde\Theta(m)$界。指定有效委员会的拟阵允许我们在委员会上实施额外的结构要求，如配额。在信息不准确的环境中，选民的回答以较小概率被破坏。我们证明需要$\tilde\Theta(nm)$次查询才能高概率达到$(1-1/e)$-近似，其中$n$是选民人数。虽然已证明的界表明我们的所有算法在渐近意义下是可行的，但它们也表明，在具有实际意义的实例中，某些算法仍需大量查询。通过使用来自Polis的真实数据以及合成数据，我们观察到，在信息不完整和不准确的情况下，我们的算法在较小实例上也能表现良好。