Data sharing remains a major hindering factor when it comes to adopting emerging AI technologies in general, but particularly in the agri-food sector. Protectiveness of data is natural in this setting; data is a precious commodity for data owners, which if used properly can provide them with useful insights on operations and processes leading to a competitive advantage. Unfortunately, novel AI technologies often require large amounts of training data in order to perform well, something that in many scenarios is unrealistic. However, recent machine learning advances, e.g. federated learning and privacy-preserving technologies, can offer a solution to this issue via providing the infrastructure and underpinning technologies needed to use data from various sources to train models without ever sharing the raw data themselves. In this paper, we propose a technical solution based on federated learning that uses decentralized data, (i.e. data that are not exchanged or shared but remain with the owners) to develop a cross-silo machine learning model that facilitates data sharing across supply chains. We focus our data sharing proposition on improving production optimization through soybean yield prediction, and provide potential use-cases that such methods can assist in other problem settings. Our results demonstrate that our approach not only performs better than each of the models trained on an individual data source, but also that data sharing in the agri-food sector can be enabled via alternatives to data exchange, whilst also helping to adopt emerging machine learning technologies to boost productivity.
翻译:数据共享仍然是采用新兴人工智能技术的主要障碍,尤其是在农业食品领域。在这种情境下,数据保护意识自然强烈;数据对所有者而言是宝贵资产,合理利用可为其提供运营和流程方面的有用见解,从而带来竞争优势。然而,新兴AI技术通常需要大量训练数据才能表现良好,这在许多场景下是不现实的。但近期机器学习进展,例如联邦学习和隐私保护技术,可通过提供基础设施和支撑技术来解决这一问题——利用多方数据训练模型,而无需共享原始数据本身。本文提出一种基于联邦学习的技术方案,采用去中心化数据(即不交换或共享、仍由所有者保留的数据)开发跨孤岛机器学习模型,以促进供应链间的数据共享。我们将数据共享方案聚焦于通过大豆产量预测改进生产优化,并提供了此类方法在其他问题场景中可能的用例。结果表明,我们的方法不仅优于单个数据源训练的模型,还表明通过数据交换的替代方案可实现在农业食品领域的数据共享,同时助力采用新兴机器学习技术以提升生产力。