Consider a tree $T=(V,E)$ with root $\circ$ and edge length function $\ell:E\to\mathbb{R}_+$. The phylogenetic covariance matrix of $T$ is the matrix $C$ with rows and columns indexed by $L$, the leaf set of $T$, with entries $C(i,j):=\sum_{e\in[i\wedge j,o]}\ell(e)$, for each $i,j\in L$. Recent work [15] has shown that the phylogenetic covariance matrix of a large, random binary tree $T$ is significantly sparsified with overwhelmingly high probability under a change-of-basis with respect to the so-called Haar-like wavelets of $T$. This finding notably enables manipulating the spectrum of covariance matrices of large binary trees without the necessity to store them in computer memory but instead performing two post-order traversals of the tree. Building on the methods of [15], this manuscript further advances their sparsification result to encompass the broader class of $k$-regular trees, for any given $k\ge2$. This extension is achieved by refining existing asymptotic formulas for the mean and variance of the internal path length of random $k$-regular trees, utilizing hypergeometric function properties and identities.
翻译:考虑一棵具有根节点 $\circ$ 和边长度函数 $\ell:E\to\mathbb{R}_+$ 的树 $T=(V,E)$。其系统发育协方差矩阵 $C$ 的行和列由 $T$ 的叶节点集合 $L$ 索引,对于任意 $i,j\in L$,其元素定义为 $C(i,j):=\sum_{e\in[i\wedge j,o]}\ell(e)$。近期研究[15]表明,对于一棵大型随机二叉树 $T$,在相对于 $T$ 的所谓类哈尔小波进行基变换后,其系统发育协方差矩阵以压倒性的概率显著稀疏化。这一发现尤其使得我们能够操作大型二叉树协方差矩阵的谱,而无需将其存储在计算机内存中,只需对树执行两次后序遍历即可。基于[15]的方法,本文进一步将其稀疏化结果推广到更广泛的 $k$-正则树类(对于任意给定的 $k\ge2$)。这一推广是通过利用超几何函数的性质和恒等式,改进随机 $k$-正则树内部路径长度均值和方差的现有渐近公式来实现的。