With the popularization of AI solutions for image based problems, there has been a growing concern for both data privacy and acquisition. In a large number of cases, information is located on separate data silos and it can be difficult for a developer to consolidate all of it in a fashion that is appropriate for machine learning model development. Alongside this, a portion of these localized data regions may not have access to a labelled ground truth. This indicates that they have the capacity to reach conclusions numerically, but are not able to assign classifications amid a lack of pertinent information. Such a determination is often negligible, especially when attempting to develop image based solutions that often necessitate this capability. With this being the case, we propose an innovative vertical federated learning (VFL) model architecture that can operate under this common set of conditions. This is the first (and currently the only) implementation of a system that can work under the constraints of a VFL environment and perform image segmentation while maintaining nominal accuracies. We achieved this by utilizing an FCN that boasts the ability to operate on federates that lack labelled data and privately share the respective weights with a central server, that of which hosts the necessary features for classification. Tests were conducted on the CamVid dataset in order to determine the impact of heavy feature compression required for the transfer of information between federates, as well as to reach nominal conclusions about the overall performance metrics when working under such constraints.
翻译:随着基于图像的AI解决方案普及,数据隐私与获取问题日益引发关注。在大量场景中,信息分布在不同的数据孤岛上,开发者难以以适合机器学习模型开发的方式整合所有数据。同时,部分本地数据区域可能无法访问标注的真实值。这意味着它们虽具备数值推理能力,但因缺乏关键信息而无法进行分类标注。这种局限性在开发通常需要此能力的图像解决方案时尤为显著。针对这一普遍情况,我们提出了一种创新的垂直联邦学习(VFL)模型架构,可在该典型条件下运行。这是首个(且目前唯一)能够在VFL环境约束下实现图像分割并保持正常精度的系统实现。我们通过采用具备在缺乏标注数据的联邦节点上运行能力的全卷积网络(FCN),并以隐私保护方式将相应权重与持有分类所需特征的中央服务器共享来实现这一目标。我们在CamVid数据集上进行了测试,以评估联邦节点间传输信息所需的重度特征压缩带来的影响,并在该约束条件下得出关于整体性能指标的规范结论。