Motivated by the emergence of decentralized machine learning ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental machine learning challenges: lack of certainty in the assessment of model quality and lack of knowledge regarding the optimal performance of any model. We show that lack of certainty can be dealt with via simple linear contracts that achieve 1-1/e fraction of the first-best utility, even if the principal has a small test set. Furthermore, we give sufficient conditions on the size of the principal's test set that achieves a vanishing additive approximation to the optimal utility. To address the lack of a priori knowledge regarding the optimal performance, we give a convex program that can adaptively and efficiently compute the optimal contract.
翻译:受去中心化机器学习生态系统的兴起驱动,我们研究数据收集的委托问题。以契约理论为出发点,我们设计了最优及近最优的契约,以应对两个核心的机器学习挑战:模型质量评估的不确定性以及任何模型最优性能先验知识的缺失。研究表明,即使委托人仅拥有小型测试集,简单的线性契约也能实现第一佳效用的1-1/e比例,从而应对不确定性。此外,我们给出了委托人测试集大小的充分条件,使得能实现最优效用的可消减加性近似。为应对最优性能先验知识的缺失,我们提出了一个可自适应且高效计算最优契约的凸规划方案。