Parameter selection in high-dimensional models is typically finetuned in a way that keeps the (relative) number of false positives under control. This is because otherwise the few true positives may be dominated by the many possible false positives. This happens, for instance, when the selection follows from a naive optimisation of an information criterion, such as AIC or Mallows's Cp. It can be argued that the overestimation of the selection comes from the optimisation process itself changing the statistics of the selected variables, in a way that the information criterion no longer reflects the true divergence between the selection and the data generating process. In lasso, the overestimation can also be linked to the shrinkage estimator, which makes the selection too tolerant of false positive selections. For these reasons, this paper works on refined information criteria, carefully balancing false positives and false negatives, for use with estimators without shrinkage. In particular, the paper develops corrected Mallows's Cp criteria for structured selection in trees and graphical models.
翻译:高维模型中的参数选择通常会进行精细调整,以控制(相对)误报率。这是因为若不如此,少数真阳性结果可能会被大量可能的假阳性结果所掩盖。例如,当选择基于对信息准则(如AIC或Mallows's Cp)的简单优化时,便会出现这种情况。可以论证,这种选择的过度估计源于优化过程本身改变了所选变量的统计特性,使得信息准则不再反映所选模型与数据生成过程之间的真实偏差。在lasso中,过度估计还与收缩估计量有关,后者使得选择对假阳性结果过于宽容。基于这些原因,本文致力于研究精细化的信息准则,在无收缩估计量的情形下,仔细平衡假阳性与假阴性。具体而言,本文针对树与图模型中的结构化选择,开发了校正后的Mallows's Cp准则。