Learning Value-at-Risk (VaR) and Expected Shortfall (ES) is important for managing financial risks effectively. Existing approaches with limited parameters are vulnerable to model misspecification in the era of big data. To address this limitation, we propose a large tail risk model, the retrieval-enhanced self-grouping autoencoder (ReSGA), which is designed with millions of parameters to exploit the rich cross-sectional dependence and long-term temporal dynamics of assets using their characteristics. Applied to monthly US equity returns from 1926 to 2023 with 153 firm characteristics, ReSGA outperforms twelve econometric and machine learning competitors in terms of out-of-sample loss and statistical backtesting. In addition, its forecast advantages can translate into significant economic gains from long-short decile portfolios that are constructed by a new size-enhanced left-side momentum strategy. To clarify the role of complexity, we further conduct a systematic scaling analysis and demonstrate that improvements in joint VaR-ES forecasting are primarily driven by data complexity rather than model complexity. Finally, our analyses of group-importance and transfer-learning exhibit the interpretability and cross-market generalizability of ReSGA.
翻译:学习风险价值(VaR)和预期缺口(ES)对于有效管理金融风险至关重要。在大数据时代,现有参数有限的模型容易因模型误设而失效。为解决这一局限,我们提出了一种大规模尾部风险模型——检索增强型自组自动编码器(ReSGA)。该模型设计有数百万个参数,能够利用资产特征充分挖掘其截面依赖性和长期时间动态性。基于1926年至2023年间153个公司特征的美国月度股票收益数据,ReSGA在样本外损失和统计回测方面优于十二种计量经济学和机器学习竞争对手。此外,其预测优势可通过由新提出的规模增强型左尾动量策略构建的多空十分位投资组合转化为显著的经济收益。为明确复杂性的作用,我们进一步开展了系统性缩放分析,结果表明联合VaR-ES预测的改进主要源于数据复杂性而非模型复杂性。最后,我们的组重要性和迁移学习分析展示了ReSGA的可解释性和跨市场泛化能力。