Flexible Regularized Estimation in High-Dimensional Mixed Membership Models

Mixed membership models are an extension of finite mixture models, where each observation can partially belong to more than one mixture component. A probabilistic framework for mixed membership models of high-dimensional continuous data is proposed with a focus on scalability and interpretability. The novel probabilistic representation of mixed membership is based on convex combinations of dependent multivariate Gaussian random vectors. In this setting, scalability is ensured through approximations of a tensor covariance structure through multivariate eigen-approximations with adaptive regularization imposed through shrinkage priors. Conditional weak posterior consistency is established on an unconstrained model, allowing for a simple posterior sampling scheme while keeping many of the desired theoretical properties of our model. The model is motivated by two biomedical case studies: a case study on functional brain imaging of children with autism spectrum disorder (ASD) and a case study on gene expression data from breast cancer tissue. These applications highlight how the typical assumption made in cluster analysis, that each observation comes from one homogeneous subgroup, may often be restrictive in several applications, leading to unnatural interpretations of data features.

翻译：混合成员模型是有限混合模型的扩展，其中每个观测值可以部分属于多个混合成分。针对高维连续数据，提出了一种以可扩展性和可解释性为重点的混合成员模型概率框架。该新型概率表示基于依赖多元高斯随机向量的凸组合。在该框架下，通过多元特征逼近对张量协方差结构进行近似，并利用收缩先验施加自适应正则化，从而确保可扩展性。在无约束模型上建立了条件弱后验一致性，使得后验采样方案简洁高效，同时保留模型的诸多理想理论性质。该模型由两个生物医学案例研究驱动：自闭症谱系障碍（ASD）儿童功能脑成像案例研究，以及乳腺癌组织基因表达数据案例研究。这些应用凸显了聚类分析中的典型假设（即每个观测值来自单一同质子群）在许多应用中往往具有局限性，导致对数据特征产生不自然的解释。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/