This research aims to evaluate the performance of several Recurrent Neural Network (RNN) architectures including Simple RNN, Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM), compared to classic algorithms such as Random Forest and XGBoost in building classification models for early crash detection in ASEAN-5 stock markets. The study is examined using imbalanced data, which is common due to the rarity of market crashes. The study analyzes daily data from 2010 to 2023 across the major stock markets of the ASEAN-5 countries, including Indonesia, Malaysia, Singapore, Thailand, and Philippines. Market crash is identified as the target variable when the major stock price indices fall below the Value at Risk (VaR) thresholds of 5%, 2.5% and 1%. predictors involving technical indicators of major local and global markets as well as commodity markets. This study includes 213 predictors with their respective lags (5, 10, 15, 22, 50, 200) and uses a time step of 7, expanding the total number of predictors to 1491. The challenge of data imbalance is addressed with SMOTE-ENN. The results show that all RNN-Based architectures outperform Random Forest and XGBoost. Among the various RNN architectures, Simple RNN stands out as the most superior, mainly due to the data characteristics that are not overly complex and focus more on short-term information. This study enhances and extends the range of phenomena observed in previous studies by incorporating variables like different geographical zones and time periods, as well as methodological adjustments.
翻译:本研究旨在评估几种循环神经网络(RNN)架构——包括简单RNN、门控循环单元(GRU)和长短期记忆网络(LSTM)——与随机森林和XGBoost等经典算法在构建东盟五国股市早期崩盘检测分类模型时的性能。研究使用不平衡数据进行检验,这种数据因市场崩盘事件罕见而普遍存在。研究分析了2010年至2023年东盟五国(包括印度尼西亚、马来西亚、新加坡、泰国和菲律宾)主要股市的日度数据。当主要股价指数跌破5%、2.5%和1%的风险价值(VaR)阈值时,市场崩盘被定义为目标变量。预测变量涉及主要本地与全球市场以及商品市场的技术指标。本研究包含213个预测变量及其各自的滞后阶数(5、10、15、22、50、200),并采用7个时间步长,将预测变量总数扩展至1491个。数据不平衡问题通过SMOTE-ENN方法进行处理。结果表明,所有基于RNN的架构均优于随机森林和XGBoost。在各种RNN架构中,简单RNN表现最为突出,这主要源于数据特征并非过于复杂且更侧重于短期信息。本研究通过纳入不同地理区域和时期等变量以及方法学调整,增强并扩展了以往研究中观察到的现象范围。