Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats and secure data based on applicable regulatory policy rules.
翻译:大数据为农业社区提供了优化资源利用、提升生产力以及增强农业实践可持续性所需的信息。农业中大数据的应用需要收集和分析来自传感器、卫星、农民调查等多种来源的数据。尽管大数据能为农业社区提供宝贵见解并提高效率,但数据安全及参与者的隐私问题也引发严重关切。为应对数据隐私问题,诸如欧盟《通用数据保护条例》、通过合同协议共享农业数据的《欧盟行为准则》以及拟议的《欧盟人工智能法案》等隐私法规已制定出台,这些法规明确了组织间数据共享的时机与方式的具体指南。为使机密农业数据在不侵犯数据主体隐私的前提下广泛用于大数据分析,我们研究了农业中隐私保护的数据共享方法。基于深度学习的合成数据生成技术已被提出用于隐私保护数据共享。然而,此类隐私保护措施缺乏对既定数据隐私政策的合规性。在本研究中,我们提出了一种新型框架,用于在隐私保护数据生成算法中强制执行隐私策略规则。我们探索了多种可用的农业行为准则,提取了与数据隐私约束相关的知识,并利用所提取的知识在隐私保护生成模型中定义隐私边界。我们运用该框架生成合成农业数据,并展示实验结果,证明合成数据集在下游任务中的实用性。同时,我们证实该框架能够规避潜在威胁,并根据适用的监管策略规则保护数据安全。