The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $λ=\log n$ and $λ=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $λ$ is selected at the detection boundary (under pure noise). PIC's choice of $λ$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.
翻译:贝叶斯信息准则与赤池信息准则旨在平衡欠拟合与过拟合,被实践者广泛日常使用。然而我们认为它们至少存在两个缺陷:其惩罚参数λ=log n和λ=2过小,导致大量错误发现;其固有的(最优子集)离散优化在高维情况下不可行。我们通过提出关键信息准则来缓解这些问题:PIC被定义为连续优化问题,其惩罚参数λ在检测边界(纯噪声条件下)选取。PIC对λ的选择基于某个统计量的分位数,我们证明了该统计量(在渐近意义下)具有枢轴性,前提是损失函数经过适当变换。因此,模拟实验表明PIC在精确支撑恢复概率上呈现相变现象——这一现象在压缩感知的无噪声研究中已被探讨。应用于实际数据时,在预测性能相近的情况下,PIC从当前最优学习器中选出了复杂度最低的模型。