We introduce a generalized information criterion that contains other well-known information criteria, such as Bayesian information Criterion (BIC) and Akaike information criterion (AIC), as special cases. Furthermore, the proposed spectral information criterion (SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinality that often is much smaller than the total number of possible models. The elements of this subset are elbows of the error curve. A practical rule for selecting a unique model within the sets of elbows is suggested as well. Theoretical invariance properties of SIC are analyzed. Moreover, we test SIC in ideal scenarios where provides always the optimal expected results. We also test SIC in several numerical experiments: some involving synthetic data, and two experiments involving real datasets. They are all real-world applications such as clustering, variable selection, or polynomial order selection, to name a few. The results show the benefits of the proposed scheme. Matlab code related to the experiments is also provided. Possible future research lines are finally discussed.
翻译:我们提出了一种广义信息准则,该准则包含了其他著名信息准则(如贝叶斯信息准则(BIC)和赤池信息准则(AIC))作为特例。此外,所提出的谱信息准则(SIC)比其他信息准则更具普适性,例如,它不严格要求已知似然函数。SIC提取误差曲线的几何特征,因此可被视为一种自动肘点检测器。SIC提供所有可能模型的一个子集,其基数通常远小于可能模型的总数。该子集中的元素即为误差曲线的肘点。我们还提出了一种在实际中从肘点集中选择唯一模型的实用规则。分析了SIC的理论不变性性质。此外,在理想场景下测试SIC时,它总能给出最优的预期结果。我们还通过多个数值实验测试了SIC:部分涉及合成数据,另有两个实验使用真实数据集。这些实验均为实际应用,如聚类、变量选择或多项式阶数选择等。结果显示了所提方案的优越性。文中还提供了与实验相关的Matlab代码。最后讨论了可能的未来研究方向。