Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability

from arxiv, Title updated from "Spatiotemporal Seismic Hazard Assessment Using VQ-VAE and Seismic Statistical Features" to "Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability" in v2 to better reflect the focus of the paper. The content is unchanged apart from the title and minor copyediting

In this paper we build upon a previous study in which we demonstrated, using XGBoost and earthquake catalogue data from Japan and Chile, that a set of 60 seismic statistical features (SSFs) had much greater predictive value than a set of 428 generic time series features from the tsfresh package. We here extend this previous work in two key ways, focusing on data from Japan as a large dataset is necessary in order to allow for the training of a deep learning (autoencoder) model. First, we move from whole-region prediction (considering, for each candidate event, the likelihood of an event M $\geq$ 5.0 anywhere in the region in the next 15 days) to localised predictions in which both the region of feature computation and the region of prediction are restricted to a circle of radius 24 km around the candidate event, and we show that performance remains excellent, similar to our previous whole-region study for the same area. Second, we here couple this proven set of SSFs, based on one-dimensional (catalogue) data, with a novel feature based on two-dimensional seismic maps, obtained by training a VQ-VAE model to reproduce such maps as output and identifying a measure of its error in doing so with a localised build-up of crustal stress. We show that while localised prediction based on SSFs can be effective alone, with test AUC values as high as those obtained in the case of Japan in our previous whole-region study, the inclusion of the new natively-spatial VQ-VAE-derived feature, top-ranked by SHAP analysis, can enhance performance and additionally appears to near-wholly replace the traditionally-computed $b$-value in terms of feature usage.

翻译：本文在前期研究基础上展开深入探讨。前期工作中，我们利用XGBoost模型与日本、智利的地震目录数据，证实由60个地震统计特征（SSFs）构成的组合，其预测效能显著优于tsfresh软件包提取的428个通用时间序列特征。本研究针对日本地区数据（因深度学习/自编码器模型训练需大规模数据集）从两个关键维度拓展前期工作：其一，将全区域预测（评估各地震候选事件未来15天内区域任意位置发生M≥5.0地震的概率）转化为局部化预测——特征计算区域与预测区域均限定为以候选事件为中心、半径24公里的圆形区域，并证实该方法保持了与前期全区域研究同等优异的性能；其二，将基于一维目录数据构建的成熟SSF特征集，与基于二维地震图的新型特征相结合——通过训练VQ-VAE模型重构地震图，将其重构误差量化为地壳应力局部积累指标。研究表明：虽然仅基于SSF的局部预测即可达到与前期日本全区域研究相当的AUC测试值，但引入由SHAP分析排序首位的VQ-VAE本征空间特征后，模型性能获进一步提升，且在特征使用层面几乎完全替代了传统计算的b值参数。