Information Value (IV) is a widely used technique for feature selection prior to the modeling phase, particularly in credit scoring and related domains. However, conventional IV-based practices rely on fixed empirical thresholds, which lack statistical justification and may be sensitive to characteristics such as class imbalance. In this work, we develop a formal statistical framework for IV by establishing its connection with Jeffreys divergence and propose a novel nonparametric hypothesis test, referred to as the J-Divergence test. Our method provides rigorous asymptotic guarantees and enables interpretable decisions based on \(p\)-values. Numerical experiments, including synthetic and real-world data, demonstrate that the proposed test is more reliable than traditional IV thresholding, particularly under strong imbalance. The test is model-agnostic, computationally efficient, and well-suited for the pre-modeling phase in high-dimensional or imbalanced settings. An open-source Python library is provided for reproducibility and practical adoption.
翻译:信息价值(IV)是一种在建模阶段前广泛使用的特征选择技术,尤其在信用评分及相关领域。然而,传统的基于IV的方法依赖于固定的经验阈值,这些阈值缺乏统计依据,并且可能对类别不平衡等数据特性敏感。在本工作中,我们通过建立IV与Jeffreys散度的联系,为IV开发了一个正式的统计框架,并提出了一种新颖的非参数假设检验,称为J-散度检验。我们的方法提供了严格的渐近保证,并支持基于\(p\)值的可解释决策。数值实验(包括合成数据和真实数据)表明,所提出的检验比传统的IV阈值法更可靠,尤其在严重不平衡的情况下。该检验与模型无关,计算高效,非常适合高维或不平衡场景下的预建模阶段。我们提供了一个开源Python库以确保可复现性和实际应用。