Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules for Secure and Confidential Data Synthesis

Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats and secure data based on applicable regulatory policy rules.

翻译：大数据为农业社区提供了优化资源利用、提升生产力以及增强农业实践可持续性所需的信息。农业中大数据的应用需要收集和分析来自传感器、卫星、农民调查等多种来源的数据。尽管大数据能为农业社区提供宝贵见解并提高效率，但数据安全及参与者的隐私问题也引发严重关切。为应对数据隐私问题，诸如欧盟《通用数据保护条例》、通过合同协议共享农业数据的《欧盟行为准则》以及拟议的《欧盟人工智能法案》等隐私法规已制定出台，这些法规明确了组织间数据共享的时机与方式的具体指南。为使机密农业数据在不侵犯数据主体隐私的前提下广泛用于大数据分析，我们研究了农业中隐私保护的数据共享方法。基于深度学习的合成数据生成技术已被提出用于隐私保护数据共享。然而，此类隐私保护措施缺乏对既定数据隐私政策的合规性。在本研究中，我们提出了一种新型框架，用于在隐私保护数据生成算法中强制执行隐私策略规则。我们探索了多种可用的农业行为准则，提取了与数据隐私约束相关的知识，并利用所提取的知识在隐私保护生成模型中定义隐私边界。我们运用该框架生成合成农业数据，并展示实验结果，证明合成数据集在下游任务中的实用性。同时，我们证实该框架能够规避潜在威胁，并根据适用的监管策略规则保护数据安全。

相关内容

大数据

关注 270

从各种各样类型的数据中，快速获得有价值信息的能力，就是大数据技术。明白这一点至关重要，也正是这一点促使该技术具备走向众多企业的潜力。大数据的4个“V”，或者说特点有四个层面：第一，数据体量巨大。从TB级别，跃升到PB级别；第二，数据类型繁多。前文提到的网络日志、视频、图片、地理位置信息等等。第三，价值密度低。以视频为例，连续不间断监控过程中，可能有用的数据仅仅有一两秒。第四，处理速度快。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日