Group Entropies and Mirror Duality: A Class of Flexible Mirror Descent Updates for Machine Learning

We introduce a comprehensive theoretical and algorithmic framework that bridges formal group theory and group entropies with modern machine learning, paving the way for an infinite, flexible family of Mirror Descent (MD) optimization algorithms. Our approach exploits the rich structure of group entropies, which are generalized entropic functionals governed by group composition laws, encompassing and significantly extending all trace-form entropies such as the Shannon, Tsallis, and Kaniadakis families. By leveraging group-theoretical mirror maps (or link functions) in MD, expressed via multi-parametric generalized logarithms and their inverses (group exponentials), we achieve highly flexible and adaptable MD updates that can be tailored to diverse data geometries and statistical distributions. To this end, we introduce the notion of \textit{mirror duality}, which allows us to seamlessly switch or interchange group-theoretical link functions with their inverses, subject to specific learning rate constraints. By tuning or learning the hyperparameters of the group logarithms enables us to adapt the model to the statistical properties of the training distribution, while simultaneously ensuring desirable convergence characteristics via fine-tuning. This generality not only provides greater flexibility and improved convergence properties, but also opens new perspectives for applications in machine learning and deep learning by expanding the design of regularizers and natural gradient algorithms. We extensively evaluate the validity, robustness, and performance of the proposed updates on large-scale, simplex-constrained quadratic programming problems.

翻译：我们提出了一个综合性的理论与算法框架，将形式群论和群熵与现代机器学习相连接，为构建无限、灵活的镜像下降优化算法族铺平了道路。该方法利用了群熵的丰富结构——这类广义熵泛函由群合成律所支配，涵盖并显著扩展了所有迹形式熵（如香农熵、Tsallis熵和Kaniadakis熵族）。通过在镜像下降中运用群论镜像映射（或称链接函数），这些映射通过多参数广义对数及其逆（群指数）表达，我们实现了高度灵活且适应性强的镜像下降更新规则，能够针对不同的数据几何结构与统计分布进行定制。为此，我们引入了“镜像对偶”的概念，使得在特定学习率约束下，能够无缝切换或互换群论链接函数与其逆函数。通过调节或学习群对数中的超参数，我们能够使模型适应训练分布的统计特性，同时通过微调确保良好的收敛特性。这种通用性不仅提供了更大的灵活性和改进的收敛性质，还通过扩展正则化器与自然梯度算法的设计，为机器学习和深度学习应用开辟了新的视角。我们在大规模单纯形约束二次规划问题上广泛评估了所提出更新方法的有效性、鲁棒性和性能。