Exponential Family Discriminant Analysis: Generalizing LDA-Style Generative Classification to Non-Gaussian Models

We introduce Exponential Family Discriminant Analysis (EFDA), a unified generative framework that extends classical Linear Discriminant Analysis (LDA) beyond the Gaussian setting to any member of the exponential family. Under the assumption that each class-conditional density belongs to a common exponential family, EFDA derives closed-form maximum-likelihood estimators for all natural parameters and yields a decision rule that is linear in the sufficient statistic, recovering LDA as a special case and capturing nonlinear decision boundaries in the original feature space. We prove that EFDA is asymptotically calibrated and statistically efficient under correct specification, and we generalise it to $K \geq 2$ classes and multivariate data. Through extensive simulation across five exponential-family distributions (Weibull, Gamma, Exponential, Poisson, Negative Binomial), EFDA matches the classification accuracy of LDA, QDA, and logistic regression while reducing Expected Calibration Error (ECE) by $2$-$6\times$, a gap that is structural: it persists for all $n$ and across all class-imbalance levels, because misspecified models remain asymptotically miscalibrated. We further prove and empirically confirm that EFDA's log-odds estimator approaches the Cramér-Rao bound under correct specification, and is the only estimator in our comparison whose mean squared error converges to zero. Complete derivations are provided for nine distributions. Finally, we formally verify all four theoretical propositions in Lean 4, using Aristotle (Harmonic) and OpenGauss (Math, Inc.) as proof generators, with all outputs independently machine-checked by AXLE (Axiom).

翻译：我们提出指数族判别分析（EFDA），这是一个统一的生成式框架，将经典线性判别分析（LDA）从高斯设定扩展至指数族的任意成员。在假定每个类条件密度属于同一指数族的条件下，EFDA推导出所有自然参数的闭式最大似然估计量，并产生一个关于充分统计量的线性决策规则，将LDA作为特例恢复，并在原始特征空间中捕捉非线性决策边界。我们证明EFDA在正确设定下是渐近校准且统计高效的，并将其推广至$K \geq 2$个类别和多元数据。通过在五个指数族分布（威布尔分布、伽马分布、指数分布、泊松分布、负二项分布）上的广泛模拟，EFDA在匹配LDA、QDA和逻辑回归的分类准确率的同时，将期望校准误差（ECE）降低$2$-$6$倍，这一差距是结构性的：它存在于所有$n$和所有类别不平衡水平下，因为错误设定的模型仍保持渐近失准。我们进一步证明并实证确认，在正确设定下，EFDA的对数几率估计量接近克拉美-罗界，并且是我们比较中唯一均方误差收敛至零的估计量。为九个分布提供了完整推导。最后，我们使用Aristotle（Harmonic）和OpenGauss（Math, Inc.）作为证明生成器，在Lean 4中形式验证了所有四个理论命题，所有输出均经AXLE（Axiom）独立机器检验。