We propose a general method for optimally approximating an arbitrary matrix $\mathbf{M}$ by a structured matrix $\mathbf{T}$ (circulant, Toeplitz/Hankel, etc.) and examine its use for estimating the spectra of genomic linkage disequilibrium matrices. This application is prototypical of a variety of genomic and proteomic problems that demand robustness to incomplete biosequence information. We perform a simulation study and corroborative test of our method using real genomic data from the Mouse Genome Database. The results confirm the predicted utility of the method and provide strong evidence of its potential value to a wide range of bioinformatics applications. Our optimal general matrix approximation method is expected to be of independent interest to an even broader range of applications in applied mathematics and engineering.
翻译:我们提出了一种通用方法,用于将任意矩阵 $\mathbf{M}$ 最优逼近为结构化矩阵 $\mathbf{T}$(循环矩阵、Toeplitz/汉克尔矩阵等),并考察了该方法在估计基因组连锁不平衡矩阵谱中的应用。这一应用是基因组学和蛋白质组学中一类问题的典型代表,此类问题要求对不完备生物序列信息具有鲁棒性。我们利用小鼠基因组数据库中的真实基因组数据进行了模拟研究和验证性测试。结果证实了该方法的预期效用,并有力证明了其在广泛生物信息学应用中的潜在价值。我们的最优通用矩阵逼近方法预计将在应用数学和工程领域的更广泛应用中具有独立参考价值。