Machine learning tasks are vulnerable to the quality of data used as input. Yet, it is often challenging for firms to obtain adequate datasets, with them being naturally distributed amongst owners, that in practice, may be competitors in a downstream market and reluctant to share information. Focusing on supervised learning for regression tasks, we develop a \textit{regression market} to provide a monetary incentive for data sharing. Our proposed mechanism adopts a Bayesian framework, allowing us to consider a more general class of regression tasks. We present a thorough exploration of the market properties, and show that similar proposals in current literature expose the market agents to sizeable financial risks, which can be mitigated in our probabilistic setting.
翻译:机器学习任务对输入数据的质量高度敏感。然而,企业往往难以获得充足的数据集——这些数据天然分散在不同所有者手中,而这些所有者在实践中可能是下游市场的竞争对手,且不愿共享信息。针对回归任务中的监督学习场景,我们提出了一种"回归市场"机制,通过提供货币激励促进数据共享。该机制采用贝叶斯框架,使我们能够处理更一般的回归任务类别。我们对市场特性进行了深入探索,并证明现有文献中的类似方案会使市场参与者面临显著金融风险,而我们提出的概率框架可以有效缓解这些风险。