Efficient first-order algorithms for large-scale, non-smooth maximum entropy models with application to wildfire science

from arxiv, The main text of our manuscript is 20 pages long, the appendices are 4 pages long, and the references are 4 pages long,for a total of 28 pages

Maximum entropy (Maxent) models are a class of statistical models that use the maximum entropy principle to estimate probability distributions from data. Due to the size of modern data sets, Maxent models need efficient optimization algorithms to scale well for big data applications. State-of-the-art algorithms for Maxent models, however, were not originally designed to handle big data sets; these algorithms either rely on technical devices that may yield unreliable numerical results, scale poorly, or require smoothness assumptions that many practical Maxent models lack. In this paper, we present novel optimization algorithms that overcome the shortcomings of state-of-the-art algorithms for training large-scale, non-smooth Maxent models. Our proposed first-order algorithms leverage the Kullback-Leibler divergence to train large-scale and non-smooth Maxent models efficiently. For Maxent models with discrete probability distribution of $n$ elements built from samples, each containing $m$ features, the stepsize parameters estimation and iterations in our algorithms scale on the order of $O(mn)$ operations and can be trivially parallelized. Moreover, the strong $\ell_{1}$ convexity of the Kullback--Leibler divergence allows for larger stepsize parameters, thereby speeding up the convergence rate of our algorithms. To illustrate the efficiency of our novel algorithms, we consider the problem of estimating probabilities of fire occurrences as a function of ecological features in the Western US MTBS-Interagency wildfire data set. Our numerical results show that our algorithms outperform the state of the arts by one order of magnitude and yield results that agree with physical models of wildfire occurrence and previous statistical analyses of wildfire drivers.

翻译：最大熵（Maxent）模型是一类利用最大熵原理从数据中估计概率分布的统计模型。由于现代数据集的规模庞大，最大熵模型需要高效的优化算法才能在大数据应用中良好扩展。然而，现有最优的最大熵模型算法最初并非为处理大数据集而设计；这些算法要么依赖可能导致不可靠数值结果的技术手段，要么扩展性差，要么要求许多实际最大熵模型不具备的光滑性假设。本文提出了新颖的优化算法，克服了现有最优算法在训练大规模非光滑最大熵模型时的缺陷。我们提出的一阶算法利用Kullback-Leibler散度高效地训练大规模非光滑最大熵模型。对于由样本构建的包含$n$个元素的离散概率分布的最大熵模型（每个样本包含$m$个特征），我们算法中的步长参数估计和迭代复杂度为$O(mn)$量级，且可轻松并行化。此外，Kullback-Leibler散度的强$\ell_{1}$凸性允许使用更大的步长参数，从而加快算法的收敛速度。为展示新算法的效率，我们以美国西部MTBS跨机构野火数据集为例，估计作为生态特征函数的火灾发生概率。数值结果表明，我们的算法比现有最优算法快一个数量级，且结果与野火发生的物理模型及以往野火驱动因素的统计分析相吻合。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

牛津大学最新《计算代数拓扑》笔记书，107页pdf

专知会员服务

44+阅读 · 2022年2月17日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日