Two firms are engaged in a competitive prediction task. Each firm has two sources of data -- labeled historical data and unlabeled inference-time data -- and uses the former to derive a prediction model, and the latter to make predictions on new instances. We study data-sharing contracts between the firms. The novelty of our study is to introduce and highlight the differences between contracts that share prediction models only, contracts to share inference-time predictions only, and contracts to share both. Our analysis proceeds on three levels. First, we develop a general Bayesian framework that facilitates our study. Second, we narrow our focus to two natural settings within this framework: (i) a setting in which the accuracy of each firm's prediction model is common knowledge, but the correlation between the respective models is unknown; and (ii) a setting in which two hypotheses exist regarding the optimal predictor, and one of the firms has a structural advantage in deducing it. Within these two settings we study optimal contract choice. More specifically, we find the individually rational and Pareto-optimal contracts for some notable cases, and describe specific settings where each of the different sharing contracts emerge as optimal. Finally, in the third level of our analysis we demonstrate the applicability of our concepts in a synthetic simulation using real loan data.
翻译:两家公司参与一项竞争性预测任务。每家公司拥有两类数据来源——带标签的历史数据与无标签的推理阶段数据,并利用前者构建预测模型,后者对新实例进行预测。我们研究公司间的数据共享契约。本研究的创新之处在于引入并强调仅共享预测模型、仅共享推理阶段预测结果、以及同时共享两者这三类契约之间的差异。我们的分析分三个层面展开:首先,我们构建一个通用贝叶斯框架以支撑研究;其次,在该框架下聚焦两种自然场景:(i)各公司预测模型的精确度为共同知识,但模型间相关性未知的场景;(ii)存在关于最优预测器的两种假设,且其中一家公司在推断该预测器时具有结构性优势的场景。在这两种场景中,我们研究最优契约选择,具体而言,发现若干典型案例中满足个体理性与帕累托最优的契约,并描述不同共享契约分别成为最优解的具体场景。最后,在第三层分析中,我们利用真实贷款数据通过合成仿真验证本文概念的适用性。