Data sharing is central to a wide variety of applications such as fraud detection, ad matching, and research. The lack of data sharing abstractions makes the solution to each data sharing problem bespoke and cost-intensive, hampering value generation. In this paper, we first introduce a data sharing model to represent every data sharing problem with a sequence of dataflows. From the model, we distill an abstraction, the contract, which agents use to communicate the intent of a dataflow and evaluate its consequences, before the dataflow takes place. This helps agents move towards a common sharing goal without violating any regulatory and privacy constraints. Then, we design and implement the contract programming model (CPM), which allows agents to program data sharing applications catered to each problem's needs. Contracts permit data sharing, but their interactive nature may introduce inefficiencies. To mitigate those inefficiencies, we extend the CPM so that it can save intermediate outputs of dataflows, and skip computation if a dataflow tries to access data that it does not have access to. In our evaluation, we show that 1) the contract abstraction is general enough to represent a wide range of sharing problems, 2) we can write programs for complex data sharing problems and exhibit qualitative improvements over other alternate technologies, and 3) quantitatively, our optimizations make sharing programs written with the CPM efficient.
翻译:数据共享是欺诈检测、广告匹配及科学研究等众多应用的核心环节。由于缺乏数据共享抽象,每个数据共享问题都需要定制化解决方案且成本高昂,这阻碍了价值创造。本文首先提出一种数据共享模型,该模型通过数据流序列表示所有数据共享问题。基于此模型,我们提炼出"合约"这一抽象概念,使参与方能够在数据流执行前,通过合约传达数据流意图并评估其潜在影响。这有助于参与方在遵守所有监管与隐私约束的前提下,逐步实现共同的共享目标。随后,我们设计并实现了合约编程模型,该模型允许参与方根据具体问题需求编程实现数据共享应用。合约虽能实现数据共享,但其交互特性可能引入低效问题。为缓解这些低效现象,我们扩展了CPM框架,使其能够保存数据流的中间输出结果,并在数据流试图访问未授权数据时跳过计算过程。评估结果表明:1)合约抽象具有足够普适性,能够表征广泛的数据共享问题;2)我们能够为复杂数据共享问题编写程序,并展现出相较于其他替代技术的质性提升;3)定量分析显示,我们的优化策略使基于CPM编写的共享程序具有高效性。