Given a dataset consisting of a single realization of a network, we consider conducting inference on a parameter selected from the data. In particular, we focus on the setting where the parameter of interest is a linear combination of the mean connectivities within and between estimated communities. Inference in this setting poses a challenge, since the communities are themselves estimated from the data. Furthermore, since only a single realization of the network is available, sample splitting is not possible. In this paper, we show that it is possible to split a single realization of a network consisting of $n$ nodes into two (or more) networks involving the same $n$ nodes; the first network can be used to select a data-driven parameter, and the second to conduct inference on that parameter. In the case of weighted networks with Poisson or Gaussian edges, we obtain two independent realizations of the network; by contrast, in the case of Bernoulli edges, the two realizations are dependent, and so extra care is required. We establish the theoretical properties of our estimators, in the sense of confidence intervals that attain the nominal (selective) coverage, and demonstrate their utility in numerical simulations and in application to a dataset representing the relationships among dolphins in Doubtful Sound, New Zealand.
翻译:给定一个仅包含网络单次观测的数据集,我们考虑对从数据中选择的参数进行推断。特别地,我们关注于感兴趣参数是估计社区内部和社区之间平均连接性的线性组合这一设定。在此设定下进行推断面临挑战,因为社区本身也是从数据中估计得到的。此外,由于仅可获得网络的单次观测,样本分割无法实现。本文证明,可以将包含$n$个节点的网络单次观测分割为两个(或更多)涉及相同$n$个节点的网络;第一个网络可用于选择数据驱动的参数,而第二个网络则可用于对该参数进行推断。对于具有泊松或高斯边权重的加权网络,我们获得两个独立的网络观测;相比之下,对于伯努利边的情况,这两个观测是相关的,因此需要格外注意。我们从置信区间达到名义(选择性)覆盖度的意义上建立了估计量的理论性质,并通过数值模拟以及在新西兰神奇峡湾海豚关系数据集上的应用展示了其效用。