In statistical classification/multiple hypothesis testing and machine learning, a model distribution estimated from the training data is usually applied to replace the unknown true distribution in the Bayes decision rule, which introduces a mismatch between the Bayes error and the model-based classification error. In this work, we derive the classification error bound to study the relationship between the Kullback-Leibler divergence and the classification error mismatch. We first reconsider the statistical bounds based on classification error mismatch derived in previous works, employing a different method of derivation. Then, motivated by the observation that the Bayes error is typically low in machine learning tasks like speech recognition and pattern recognition, we derive a refined Kullback-Leibler-divergence-based bound on the error mismatch with the constraint that the Bayes error is lower than a threshold.
翻译:在统计分类/多重假设检验与机器学习中,从训练数据估计得到的模型分布通常被用于替代贝叶斯决策规则中未知的真实分布,这导致了贝叶斯误差与基于模型的分类误差之间的失配。本文推导了分类误差界,以研究Kullback-Leibler散度与分类误差失配之间的关系。我们首先重新审视了先前工作中基于分类误差失配推导的统计界,并采用了不同的推导方法。随后,受机器学习任务(如语音识别和模式识别)中贝叶斯误差通常较低的观察所启发,我们在贝叶斯误差低于某一阈值的约束条件下,推导了一个基于Kullback-Leibler散度的关于误差失配的精细界。