We study statistical parameter estimation in the setting of data markets. A buyer seeks to estimate a parameter based on samples that can be purchased from competing providers that differ in their data quality and provision costs. When quality is known ex ante, we define a cost-per-information score that summarizes each provider's provision cost per unit of information about the buyer's estimation objective. We describe second-score procurement mechanism that ranks providers by this score, and endogenously chooses both a provider and a sample size while making truthful cost reports optimal. We then turn to the more realistic setting where data quality is private, and can only be indirectly observed via the delivered data. In this setting, we propose a simple mechanism that augments the second-score rule with a lenient ex post statistical test of the reported quality. We prove that under mild conditions, there exists an equilibrium in which sellers report costs truthfully and report quality up to deviations that vanish as the procured sample size grows. Our analysis highlights how the choice of verification test and the buyer's accuracy-cost tradeoff jointly shape participation and misreporting incentives in data markets.
翻译:我们研究了数据市场背景下的统计参数估计问题。买方需要根据可从竞争性供应商处购买的样本来估计参数,这些供应商在数据质量和供应成本上存在差异。当质量事先已知时,我们定义了一个每单位信息成本评分,该评分总结了每个供应商相对于买方估计目标每单位信息的供应成本。我们描述了一种二次评分采购机制,该机制根据此评分对供应商进行排名,并内生地选择供应商和样本量,同时使诚实的成本报告达到最优。随后,我们转向更现实的场景,即数据质量是私有的,只能通过交付的数据间接观察。在此场景中,我们提出了一种简单机制,该机制在二次评分规则基础上增加了针对所报告质量的宽松事后统计检验。我们证明,在温和条件下,存在一个均衡,其中卖方如实报告成本,并报告质量,其偏差随着采购样本量的增加而消失。我们的分析强调了验证测试的选择与买方的精度-成本权衡如何共同塑造数据市场中的参与和虚假报告激励。