We introduce a random recursive tree model with two communities, called balanced community modulated random recursive tree, or BCMRT in short. In this setting, pairs of nodes of different type appear sequentially. Each node of the pair decides independently to attach to their own type with probability 1-q, or to the other type with probability q, and then chooses its parent uniformly within the set of existing nodes with the selected type. We find that the limiting degree distributions coincide for different q. Therefore, as far as inference is concerned, other statistics have to be studied. We first consider the setting where the time-labels of the nodes, i.e., their time of arrival, are observed but their type is not. In this setting, we design a consistent estimator for q and provide bounds for the feasibility of testing between two different values of q. Moreover, we show that if q is small enough, then it is possible to cluster the nodes in a way correlated with the true partition, even though the algorithm is exponential in time (in passing, we show that our clustering procedure is intimately connected to the NP-hard problem of minimum fair bisection). In the unlabelled setting, i.e., when only the tree structure is observed, we show that it is possible to test between different values of q in a strictly better way than by random guessing. This follows from a delicate analysis of the sum-of-distances statistic.
翻译:我们引入了一个具有两个社区的随机递归树模型,称为平衡社区调制随机递归树(简称BCMRT)。在此设定中,不同类别的节点对按顺序出现。每一对中的每个节点独立地以概率1-q依附于自身类别,或以概率q依附于另一类别,然后在已有节点中均匀选择具有所选类别的节点作为父节点。我们发现,对于不同的q值,极限度分布是一致的。因此,就推断而言,需要研究其他统计量。我们首先考虑节点的时间标签(即到达时间)可观测但其类别不可观测的设定。在此设定下,我们设计了q的一致估计量,并给出了检验两个不同q值可行性的界限。此外,我们证明了若q足够小,则可以对节点进行与真实划分相关的聚类,即使该算法的时间复杂度是指数级的(顺带指出,我们的聚类过程与NP困难的最小公平二分问题密切相关)。在无标签设定(即仅观测树结构)下,我们证明可以通过严格优于随机猜测的方式检验不同q值。这一结论源于对距离和统计量的精细分析。