Generating accurate extremes from an observational data set is crucial when seeking to estimate risks associated with the occurrence of future extremes which could be larger than those already observed. Applications range from the occurrence of natural disasters to financial crashes. Generative approaches from the machine learning community do not apply to extreme samples without careful adaptation. Besides, asymptotic results from extreme value theory (EVT) give a theoretical framework to model multivariate extreme events, especially through the notion of multivariate regular variation. Bridging these two fields, this paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities. We illustrate the relevance of our approach on a synthetic data set and on a real data set of discharge measurements along the Danube river network. The latter shows the potential of our approach for flood risks' assessment. In addition to outperforming the standard VAE for the tested data sets, we also provide a comparison with a competing EVT-based generative approach. On the tested cases, our approach improves the learning of the dependency structure between extremes.
翻译:从观测数据集中生成准确的极值对于评估未来可能发生且超过已观测值的极端事件所关联的风险至关重要。应用范围涵盖自然灾害发生到金融崩溃。若不加谨慎调整,机器学习领域的生成方法无法直接应用于极值样本。此外,极值理论(EVT)的渐近结果提供了建模多元极端事件的理论框架,特别是通过多元正则变化的概念。本文融合这两个领域,详细阐述了一种用于采样多元重尾分布(即可能产生特别大强度极值的分布)的变分自编码器(VAE)方法。我们在合成数据集以及多瑙河网络流量测量真实数据集上展示了本文方法的相关性。后者显示了该方法在洪水风险评估方面的潜力。除了在测试数据集上优于标准VAE之外,我们还与一种基于EVT的竞争性生成方法进行了比较。在测试案例中,我们的方法改进了极值间依赖结构的学习。