We derive closed-form expressions for the Bayes optimal decision boundaries in binary classification of high dimensional overlapping Gaussian mixture model (GMM) data, and show how they depend on the eigenstructure of the class covariances, for particularly interesting structured data. We empirically demonstrate, through experiments on synthetic GMMs inspired by real-world data, that deep neural networks trained for classification, learn predictors which approximate the derived optimal classifiers. We further extend our study to networks trained on authentic data, observing that decision thresholds correlate with the covariance eigenvectors rather than the eigenvalues, mirroring our GMM analysis. This provides theoretical insights regarding neural networks' ability to perform probabilistic inference and distill statistical patterns from intricate distributions.
翻译:本文推导了高维重叠高斯混合模型(GMM)数据二分类问题中贝叶斯最优决策边界的闭式表达式,并针对特定具有实际意义的结构化数据,阐明了这些边界如何依赖于类别协方差矩阵的特征结构。通过在受真实数据启发的合成GMM上进行实验,我们实证表明:经过分类训练的深度神经网络所学习到的预测器,能够逼近所推导出的最优分类器。我们将研究进一步扩展到在真实数据上训练的网络,观察到其决策阈值与协方差矩阵的特征向量而非特征值相关,这与我们对GMM的分析结论一致。这一发现为理解神经网络执行概率推断及从复杂分布中提取统计模式的能力提供了理论见解。