The integration of data and knowledge from several sources is known as data fusion. When data is only available in a distributed fashion or when different sensors are used to infer a quantity of interest, data fusion becomes essential. In Bayesian settings, a priori information of the unknown quantities is available and, possibly, present among the different distributed estimators. When the local estimates are fused, the prior knowledge used to construct several local posteriors might be overused unless the fusion node accounts for this and corrects it. In this paper, we analyze the effects of shared priors in Bayesian data fusion contexts. Depending on different common fusion rules, our analysis helps to understand the performance behavior as a function of the number of collaborative agents and as a consequence of different types of priors. The analysis is performed by using two divergences which are common in Bayesian inference, and the generality of the results allows to analyze very generic distributions. These theoretical results are corroborated through experiments in a variety of estimation and classification problems, including linear and nonlinear models, and federated learning schemes.
翻译:来自多个源的数据和知识的整合称为数据融合。当数据仅以分布式方式可用,或使用不同传感器推断感兴趣的量时,数据融合变得至关重要。在贝叶斯框架中,未知量的先验信息是可用的,并可能存在于不同的分布式估计器中。当局部估计被融合时,用于构建多个局部后验的先验知识可能被过度使用,除非融合节点考虑到这一点并加以修正。本文分析了共享先验在贝叶斯数据融合场景中的影响。基于不同的常见融合规则,我们的分析有助于理解性能行为如何随协作智能体数量以及不同类型的先验而变化。该分析使用贝叶斯推理中常见的两种散度进行,结果的普适性使得可以分析非常通用的分布。这些理论结果通过一系列估计和分类问题的实验得到验证,包括线性和非线性模型以及联邦学习方案。