Financial market simulation (FMS) serves as a promising tool for understanding market anomalies and the underlying trading behaviors. To ensure high-fidelity simulations, it is crucial to calibrate the FMS model for generating data closely resembling the observed market data. Previous efforts primarily focused on calibrating the mid-price data, leading to essential information loss of the market activities and thus biasing the calibrated model. The Limit Order Book (LOB) data is the fundamental data fully capturing the market micro-structure and is adopted by worldwide exchanges. However, LOB is not applicable to existing calibration objective functions due to its tabular structure not suitable for the vectorized input requirement. This paper proposes to explicitly learn the vectorized representations of LOB with a Transformer-based autoencoder. Then the latent vector, which captures the major information of LOB, can be applied for calibration. Extensive experiments show that the learned latent representation not only preserves the non-linear auto-correlation in the temporal axis, but the precedence between successive price levels of LOB. Besides, it is verified that the performance of the representation learning stage is consistent with the downstream calibration tasks. Thus, this work also progresses the FMS on LOB data, for the first time.
翻译:金融市场模拟(FMS)是理解市场异常现象及底层交易行为的重要工具。为确保高保真模拟,必须对FMS模型进行校准,使其生成数据与观测市场数据高度相似。先前研究主要集中于校准中间价格数据,导致市场活动信息严重丢失,从而使校准模型产生偏差。限价订单簿(LOB)数据是完整捕捉市场微观结构的基础数据,已被全球交易所广泛采用。然而,由于LOB的表格结构不适用于向量化输入要求,现有校准目标函数无法直接处理该数据。本文提出通过基于Transformer的自编码器显式学习LOB的向量化表示,从而可将捕捉LOB主要信息的潜在向量应用于校准任务。大量实验表明,学习得到的潜在表示不仅保留了时间轴上的非线性自相关特性,还保持了LOB连续价格层级间的优先关系。此外,研究验证了表示学习阶段的性能与下游校准任务具有一致性。因此,本工作首次实现了基于LOB数据的金融市场模拟研究进展。