Identifying subtypes of complex conditions, such as Inflammatory Bowel Disease (IBD), often requires capturing latent patterns in longitudinal omics data. However, these data are typically high-dimensional, sparsely sampled, and irregularly observed over time, posing substantial challenges for conventional (bi)clustering and functional data analysis methods. We propose Tri-SfSVD, a unified sparse functional Singular Value Decomposition framework for discovering biclusters and triclusters in longitudinal data. Unlike existing functional biclustering methods that rely on ad hoc imputation or enforce restrictive shape-homogeneity assumptions, Tri-SfSVD integrates continuous trajectory estimation with simultaneous subject, feature, and temporal selection within a single optimization framework. By imposing sparse penalties across subjects, variables, and temporal subregions, the proposed method works directly on observed data to uncover localized structures at the subject, subject-feature, and subject-feature-time levels. Extensive simulations demonstrate that Tri-SfSVD outperforms existing approaches in high-dimensional settings. Applied to IBD multi-omics data, the method identified three biclusters linking sample clusters with distinct IBD-related clinical characteristics to microbial pathway groups associated with specific bacterial taxa, providing interpretable subject-pathway associations for characterizing disease heterogeneity. Applied to multi-channel EEG data, the method identified three triclusters linking sample clusters with distinct alcohol-related phenotypes to localized brain activity patterns, including subgroup differences separated by temporal subregions within the same spatial region.
翻译:识别复杂疾病亚型(如炎症性肠病)通常需要从纵向组学数据中捕捉潜在模式。然而,这类数据通常具有高维度、稀疏采样及时间观测不规则的特点,对传统(双)聚类和功能数据分析方法构成重大挑战。我们提出Tri-SfSVD,一种统一的稀疏功能奇异值分解框架,用于发现纵向数据中的双聚类和三聚类。与依赖临时插值或施加严格形状同质性假设的现有功能双聚类方法不同,Tri-SfSVD在单一优化框架内整合了连续轨迹估计与受试者、特征及时间维度的同步选择。通过引入受试者、变量及时间子区域上的稀疏惩罚,该方法直接基于观测数据,在受试者、受试者-特征及受试者-特征-时间层面揭示局部化结构。大量模拟实验表明,Tri-SfSVD在高维场景下优于现有方法。将该方法应用于炎症性肠病多组学数据时,识别出三个双聚类,将样本聚类(具有不同IBD相关临床特征)与特定细菌类群相关的微生物通路群联系起来,提供了可解释的受试者-通路关联以表征疾病异质性。应用于多通道脑电图数据时,该方法识别出三个三聚类,将样本聚类(具有不同酒精相关表型)与局部脑活动模式联系起来,包括同一空间区域内由时间子区域分隔的亚组差异。