Generating accurate extremes from an observational data set is crucial when seeking to estimate risks associated with the occurrence of future extremes which could be larger than those already observed. Applications range from the occurrence of natural disasters to financial crashes. Generative approaches from the machine learning community do not apply to extreme samples without careful adaptation. Besides, asymptotic results from extreme value theory (EVT) give a theoretical framework to model multivariate extreme events, especially through the notion of multivariate regular variation. Bridging these two fields, this paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities. We illustrate the relevance of our approach on a synthetic data set and on a real data set of discharge measurements along the Danube river network. The latter shows the potential of our approach for flood risks' assessment. In addition to outperforming the standard VAE for the tested data sets, we also provide a comparison with a competing EVT-based generative approach. On the tested cases, our approach improves the learning of the dependency structure between extremes.
翻译:从观测数据集中准确生成极值,对于估计未来可能出现的、超过已观测水平的极端事件相关风险至关重要。该问题在自然灾害发生至金融崩溃等广泛领域均有应用。若不经过审慎调整,机器学习领域的生成方法难以直接适用于极值样本。此外,极值理论(EVT)中的渐近结果为多变量极端事件建模提供了理论框架,特别是通过多变量正则变化的概念。为连接这两个领域,本文详述了一种用于抽样多变量重尾分布(即可能产生特大强度极值的分布)的变分自编码器(VAE)方法。我们通过合成数据集和多瑙河流域流量实测数据集验证了该方法的适用性。后者展示了本方法在洪水风险评估中的应用潜力。除了在测试数据集上性能优于标准VAE外,我们还提供了与基于EVT的竞争性生成方法的对比。在测试案例中,本方法提升了对极值间依赖结构的学习能力。