Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probability that the Lasso estimator for the neighborhood of a node within a Gaussian graphical model, optimized using a prediction oracle, misidentifies the neighborhood. Our results pertain to both undirected and directed acyclic graphs, encompassing general, sparse covariance structures. To support our theoretical findings, we conduct an empirical investigation of this inconsistency by contrasting our outcomes with other commonly used information criteria through an extensive simulation study. Given that many algorithms designed to learn the structure of graphical models require hyperparameter selection, the precise calibration of this hyperparameter is paramount for accurately estimating the inherent structure. Consequently, our observations shed light on this widely recognized practical challenge.
翻译:尽管多年来对各类模型选择标准的优劣与权衡进行了大量研究,但获得能阐明交叉验证行为的稳健结果仍是一项艰巨挑战。本文揭示了交叉验证在识别高斯图模型结构时固有的局限性。我们给出了有限样本界,该界刻画了在预测最优准则优化下,高斯图模型节点邻域的Lasso估计量误识别邻域的概率。我们的结论同时适用于无向图和有向无环图,涵盖了通用的稀疏协方差结构。为支撑理论发现,我们通过大规模模拟研究,将上述不一致性与其他常用信息准则进行对比实证分析。由于许多设计用于学习图模型结构的算法需要超参数选择,因此超参数的精确校准对准确估计内在结构至关重要。据此,我们的发现为这一广为人知的实践难题提供了新见解。