Bregman proximal point algorithm (BPPA) has witnessed emerging machine learning applications, yet its theoretical understanding has been largely unexplored. We study the computational properties of BPPA through learning linear classifiers with separable data, and demonstrate provable algorithmic regularization of BPPA. For any BPPA instantiated with a fixed Bregman divergence, we provide a lower bound of the margin obtained by BPPA with respect to an arbitrarily chosen norm. The obtained margin lower bound differs from the maximal margin by a multiplicative factor, which inversely depends on the condition number of the distance-generating function measured in the dual norm. We show that the dependence on the condition number is tight, thus demonstrating the importance of divergence in affecting the quality of the learned classifiers. We then extend our findings to mirror descent, for which we establish similar connections between the margin and Bregman divergence, together with a non-asymptotic analysis. Numerical experiments on both synthetic and real-world datasets are provided to support our theoretical findings. To the best of our knowledge, the aforementioned findings appear to be new in the literature of algorithmic regularization.
翻译:Bregman邻近点算法(BPPA)在机器学习中涌现出新兴应用,但其理论基础仍鲜有探索。我们通过基于可分数据学习线性分类器来研究BPPA的计算特性,并证明其具有可证明的算法正则化性质。对于任意由固定Bregman散度实例化的BPPA,我们给出了BPPA相对于任意选定范数所获得的间隔下界。该间隔下界与最大间隔相差一个乘法因子,该因子反比于距离生成函数在对偶范数下估计的条件数。我们证明该条件数的依赖关系是紧的,从而揭示了散度在影响学习分类器质量中的关键作用。随后我们将发现扩展到镜像下降算法,建立了间隔与Bregman散度之间的类似联系,并附上非渐近性分析。通过合成数据集与真实数据集的数值实验,我们验证了理论发现。据我们所知,上述发现属于算法正则化文献中的全新结果。