Standard clustering techniques assume a common configuration for all features in a dataset. However, when dealing with multi-view or longitudinal data, the clusters' number, frequencies, and shapes may need to vary across features to accurately capture dependence structures and heterogeneity. In this setting, classical model-based clustering fails to account for within-subject dependence across domains. We introduce conditional partial exchangeability, a novel probabilistic paradigm for dependent random partitions of the same objects across distinct domains. Additionally, we study a wide class of Bayesian clustering models based on conditional partial exchangeability, which allows for flexible dependent clustering of individuals across features, capturing the specific contribution of each feature and the within-subject dependence, while ensuring computational feasibility.
翻译:标准聚类技术假设数据集中所有特征具有相同的配置。然而,当处理多视图或纵向数据时,聚类的数量、频率和形状可能需要随特征变化,以准确捕捉依赖结构和异质性。在此背景下,经典的基于模型的聚类方法未能解释跨域内个体间的依赖关系。我们引入了条件部分可交换性,这是一种针对同一对象在不同域中的依赖随机划分的新型概率范式。此外,我们研究了一类基于条件部分可交换性的广泛贝叶斯聚类模型,该模型允许在特征间对个体进行灵活的依赖聚类,捕捉每个特征的特定贡献和个体内依赖关系,同时确保计算可行性。