Statistical heterogeneity of clients' local data is an important characteristic in federated learning, motivating personalized algorithms tailored to the local data statistics. Though there has been a plethora of algorithms proposed for personalized supervised learning, discovering the structure of local data through personalized unsupervised learning is less explored. We initiate a systematic study of such personalized unsupervised learning by developing algorithms based on optimization criteria inspired by a hierarchical Bayesian statistical framework. We develop adaptive algorithms that discover the balance between using limited local data and collaborative information. We do this in the context of two unsupervised learning tasks: personalized dimensionality reduction and personalized diffusion models. We develop convergence analyses for our adaptive algorithms which illustrate the dependence on problem parameters (e.g., heterogeneity, local sample size). We also develop a theoretical framework for personalized diffusion models, which shows the benefits of collaboration even under heterogeneity. We finally evaluate our proposed algorithms using synthetic and real data, demonstrating the effective sample amplification for personalized tasks, induced through collaboration, despite data heterogeneity.
翻译:客户端本地数据的统计异质性是联邦学习的重要特性,这催生了针对本地数据统计特征定制的个性化算法。尽管已有大量针对个性化监督学习的算法提出,但通过个性化无监督学习发现本地数据结构的探索尚不充分。我们通过基于分层贝叶斯统计框架启发的优化准则开发算法,首次系统性地研究了此类个性化无监督学习。我们开发了能够自适应发现本地有限数据与协作信息之间平衡的算法,并在两项无监督学习任务中开展研究:个性化降维与个性化扩散模型。我们针对自适应算法开展了收敛性分析,阐明了其对问题参数(如异质性、本地样本量)的依赖关系。同时建立了个性化扩散模型的理论框架,揭示了即使在异质性条件下协作仍能带来优势。最后基于合成数据与真实数据对所提算法进行评估,证明尽管存在数据异质性,通过协作仍能实现个性化任务的有效样本放大效应。