The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $λ=\log n$ and $λ=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $λ$ is selected at the detection boundary (under pure noise). PIC's choice of $λ$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.
翻译:贝叶斯信息准则与赤池信息准则旨在欠拟合与过拟合之间寻求良好平衡,这些准则已被实践工作者广泛使用。然而我们认为它们至少存在两个缺陷:其惩罚参数 $λ=\log n$ 与 $λ=2$ 取值过小,导致大量错误发现;且其固有的(最优子集)离散优化问题在高维情形下不可行。我们通过关键信息准则缓解这些问题:PIC 被定义为一个连续优化问题,其惩罚参数 $λ$ 在检测边界(纯噪声条件下)被选定。PIC 对 $λ$ 的选择基于某统计量的分位数,我们证明在损失函数经过适当变换后,该统计量具有(渐近)关键性。实验结果显示,PIC 在精确支撑恢复概率上呈现相变现象,该现象在压缩感知的无噪声研究中已被探讨。在实际数据应用中,在预测性能相近的情况下,PIC 能从当前先进学习器中选出复杂度最低的模型。