We introduce a random recursive tree model with two communities, called balanced community modulated random recursive tree, or BCMRT in short. In this setting, pairs of nodes of different type appear sequentially. Each one of them decides independently to attach to their own type with probability 1-q, or to the other type with probability q, and then chooses its parent uniformly within the set of existing nodes with the selected type. We find that the limiting degree distributions coincide for different q. Therefore, as far as inference is concerned, other statistics have to be studied. We first consider the setting where the time-labels of the nodes, i.e. their time of arrival, are observed but their type is not. In this setting, we design a consistent estimator for q and provide bounds for the feasibility of testing between two different values of q. Moreover, we show that if q is small enough, then it is possible to cluster in a way correlated with the true partition, even though the algorithm is exponential in time. In the unlabelled setting, i.e. when only the tree structure is observed, we show that it is possible to test between different values of q in a strictly better way than by random guessing. This follows from a delicate analysis of the sum-of-distances statistic.
翻译:我们引入了一个具有两个社区的随机递归树模型,称为平衡社区调制随机递归树(简称BCMRT)。在该设置中,不同类型节点对按顺序出现。每对节点独立地以概率1-q附着于自身类型,或以概率q附着于另一类型,并在所选类型的现有节点集合中均匀选择其父节点。我们发现,对于不同的q,极限度分布一致。因此,从推断角度看,需要研究其他统计量。我们首先考虑节点时间标签(即到达时间)可观测但类型未知的设置。在此设置中,我们设计了q的一致估计量,并给出了不同q值之间检验可行性的界限。此外,我们证明,若q足够小,则可以进行与真实分区相关的聚类,尽管算法时间复杂度为指数级。在无标签设置(即仅观测树结构)中,我们证明能够以严格优于随机猜测的方式检验不同q值,这源于对距离和统计量的精细分析。