Clusters of similar or dissimilar objects are encountered in many fields. Frequently used approaches treat the central object of each cluster as latent. Yet, often objects of one or more types cluster around objects of another type. Such arrangements are common in biomedical images of cells, in which nearby cell types likely interact. Quantifying spatial relationships may elucidate biological mechanisms. Parent-offspring statistical frameworks can be usefully applied even when central objects (parents) differ from peripheral ones (offspring). We propose the novel multivariate cluster point process (MCPP) to quantify multi-object (e.g., multi-cellular) arrangements. Unlike commonly used approaches, the MCPP exploits locations of the central parent object in clusters. It accounts for possibly multilayered, multivariate clustering. The model formulation requires specification of which object types function as cluster centers and which reside peripherally. If such information is unknown, the relative roles of object types may be explored by comparing fit of different models via the deviance information criterion (DIC). In simulated data, we compared DIC of a series of models; the MCPP correctly identified simulated relationships. It also produced more accurate and precise parameter estimates than the classical univariate Neyman-Scott process model. We also used the MCPP to quantify proposed configurations and explore new ones in human dental plaque biofilm image data. MCPP models quantified simultaneous clustering of Streptococcus and Porphyromonas around Corynebacterium and of Pasteurellaceae around Streptococcus and successfully captured hypothesized structures for all taxa. Further exploration suggested the presence of clustering between Fusobacterium and Leptotrichia, a previously unreported relationship.
翻译:相似或相异对象的簇在许多领域普遍存在。常用方法将每个簇的中心对象视为潜在变量,但实际中往往存在一种或多种类型的对象围绕另一种类型对象聚集的现象。这种空间排列在细胞生物医学图像中尤为常见,邻近细胞类型间很可能存在相互作用,量化空间关系有助于阐明生物机制。当中心对象(亲本)与外围对象(子代)存在差异时,亲本-子代统计框架可发挥有效作用。本文提出新型多变量簇点过程(MCPP)以量化多对象(如多细胞)空间构型。与常用方法不同,MCPP充分利用簇中中心亲本对象的位置信息,并能处理可能存在的多层多变量聚类现象。模型构建需指定哪些对象类型作为簇中心、哪些驻留外围。若此类信息未知,可通过偏差信息准则(DIC)比较不同模型拟合优度来探究对象类型的相对角色。在模拟数据中,我们比较了系列模型的DIC值,MCPP正确识别了模拟关系,且参数估计较经典单变量Neyman-Scott过程模型更准确精确。我们还将MCPP应用于人类牙菌斑生物膜图像数据,量化已知构型并探索新构型。MCPP模型成功量化了链球菌属和卟啉单胞菌属围绕棒状杆菌属、巴斯德菌科围绕链球菌属的同时聚类现象,并准确捕捉了所有分类单元的假设结构。进一步探索提示梭杆菌属与纤毛菌属之间存在聚类关系,这是此前未被报道的关联。