We study the training dynamics of neural classifiers through the lens of binary hypothesis testing. We re-formalize classification as a collection of binary tests between class-conditional distributions induced by learned representations and show empirically that, along training trajectories, well-generalizing networks progressively approach Neyman-Pearson optimal decision rules, as measured by monotonic growth in the KL divergence retained by learned representations. We provide sufficient conditions for exact optimality, discuss its implications for training regularization, and define an informational plane, (so-called Evidence-Error plane) where convergence can be assessed methodically across network architecture.
翻译:我们从二元假设检验的视角研究神经分类器的训练动态。将分类重新形式化为由学习表征诱导的类条件分布间的二元检验集合,并通过实证表明:在训练轨迹中,良好泛化的网络会逐步逼近奈曼-皮尔逊最优决策规则,这一过程可通过学习表征所保留KL散度的单调增长来度量。我们给出了严格最优性的充分条件,探讨了其对训练正则化的意义,并定义了一个信息平面(即所谓的证据-误差平面),在该平面上可系统评估跨网络架构的收敛性。