In the context of network data, bipartite networks are of particular interest, as they provide a useful description of systems representing relationships between sending and receiving nodes. In this framework, we extend the Mixture of Latent Trait Analyzers (MLTA) to perform a joint clustering of sending and receiving nodes, as in the biclustering framework. In detail, sending nodes are partitioned into clusters (called components) via a finite mixture of latent trait models. In each component, receiving nodes are partitioned into clusters (called segments) by adopting a flexible and parsimonious specification of the linear predictor. Dependence between receiving nodes is modeled via a multidimensional latent trait, as in the original MLTA specification. The proposal also allows for the inclusion of concomitant variables in the latent layer of the model, with the aim of understanding how they influence component formation. To estimate model parameters, an EM-type algorithm based on a Gauss-Hermite approximation of intractable integrals is proposed. A simulation study is conducted to test the performance of the model in terms of clustering and parameters' recovery. The proposed model is applied to a bipartite network on pediatric patients possibly affected by appendicitis with the objective of identifying groups of patients (sending nodes) being similar with respect to subsets of clinical conditions (receiving nodes).
翻译:在网络数据领域,双向网络因其能有效描述发送节点与接收节点间的系统关系而备受关注。本研究将潜在特质分析器混合模型(MLTA)进行扩展,在双聚类框架下实现发送节点与接收节点的联合聚类。具体而言,通过有限混合潜在特质模型将发送节点划分为簇(称为组件);在每个组件中,采用灵活简约的线性预测器规范将接收节点划分为簇(称为分段)。如原MLTA规范所述,接收节点间的依赖关系通过多维潜在特质进行建模。该模型还允许在潜在层纳入伴随变量,以探究其对组件形成的影响。为估计模型参数,本研究提出一种基于高斯-埃尔米特积分处理不可解积分的EM型算法。通过模拟研究检验模型在聚类与参数恢复方面的性能。将所提模型应用于疑似阑尾炎患儿双向网络数据,旨在识别在临床病症子集(接收节点)上具有相似性的患者群体(发送节点)。