We study collaborative normal mean estimation, where $m$ strategic agents collect i.i.d samples from a normal distribution $\mathcal{N}(\mu, \sigma^2)$ at a cost. They all wish to estimate the mean $\mu$. By sharing data with each other, agents can obtain better estimates while keeping the cost of data collection small. To facilitate this collaboration, we wish to design mechanisms that encourage agents to collect a sufficient amount of data and share it truthfully, so that they are all better off than working alone. In naive mechanisms, such as simply pooling and sharing all the data, an individual agent might find it beneficial to under-collect and/or fabricate data, which can lead to poor social outcomes. We design a novel mechanism that overcomes these challenges via two key techniques: first, when sharing the others' data with an agent, the mechanism corrupts this dataset proportional to how much the data reported by the agent differs from the others; second, we design minimax optimal estimators for the corrupted dataset. Our mechanism, which is incentive compatible and individually rational, achieves a social penalty (sum of all agents' estimation errors and data collection costs) that is at most a factor 2 of the global minimum. When applied to high dimensional (non-Gaussian) distributions with bounded variance, this mechanism retains these three properties, but with slightly weaker results. Finally, in two special cases where we restrict the strategy space of the agents, we design mechanisms that essentially achieve the global minimum.
翻译:我们研究协作正态均值估计问题:$m$个策略性智能体以一定成本收集来自正态分布$\mathcal{N}(\mu, \sigma^2)$的独立同分布样本。所有智能体均希望估计均值$\mu$。通过相互共享数据,智能体能在控制数据收集成本的同时获得更优估计。为促进这种协作,我们需设计机制激励智能体收集足量数据并真实共享,使得各方均比独立工作时获益。在简单汇集并共享所有数据的朴素机制中,个别智能体可能发现少收集数据或伪造数据更有利可图,这会导致不良社会后果。我们设计了一种新型机制,通过两项关键技术克服上述挑战:其一,在向某智能体共享他人数据时,机制会根据该智能体报告数据与群体的差异程度对数据集进行相应扰动;其二,我们为受扰动数据集设计了极小化最优估计器。我们的机制同时满足激励相容与个体理性,其社会代价(所有智能体估计误差与数据收集成本之和)至多为全局最优值的2倍。当应用于具有有界方差的高维(非高斯)分布时,该机制仍保留上述三项性质,但结果稍弱。最后,在两种限制智能体策略空间的特殊情形下,我们设计的机制能够本质达到全局最优。