As Artificial Intelligence (AI) models are gradually being adopted in real-life applications, the explainability of the model used is critical, especially in high-stakes areas such as medicine, finance, etc. Among the commonly used models, Linear Discriminant Analysis (LDA) is a widely used classification tool that is also explainable thanks to its ability to model class distributions and maximize class separation through linear feature combinations. Nevertheless, real-world data is frequently incomplete, presenting significant challenges for classification tasks and model explanations. In this paper, we propose a novel approach to LDA under missing data, termed \textbf{\textit{Weighted missing Linear Discriminant Analysis (WLDA)}}, to directly classify observations in data that contains missing values without imputation effectively by estimating the parameters directly on missing data and use a weight matrix for missing values to penalize missing entries during classification. Furthermore, we also analyze the theoretical properties and examine the explainability of the proposed technique in a comprehensive manner. Experimental results demonstrate that WLDA outperforms conventional methods by a significant margin, particularly in scenarios where missing values are present in both training and test sets.
翻译:随着人工智能模型逐渐应用于现实场景,所用模型的可解释性变得至关重要,尤其在医学、金融等高风险领域。在常用模型中,线性判别分析是一种广泛使用的分类工具,其通过线性特征组合建模类别分布并最大化类别分离的能力,使其同时具备可解释性。然而,现实数据往往存在缺失,这给分类任务和模型解释带来了重大挑战。本文提出一种针对缺失数据的LDA新方法,称为**加权缺失线性判别分析**,该方法通过在缺失数据上直接估计参数,并采用缺失值权重矩阵在分类过程中对缺失项进行惩罚,从而无需插补即可直接对含缺失值的数据进行有效分类。此外,我们还系统分析了该方法的理论性质,并从多角度检验了其可解释性。实验结果表明,WLDA显著优于传统方法,尤其在训练集和测试集均存在缺失值的场景中表现突出。