Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.
翻译:金融企业通常处理并存储海量高频持续生成的时间序列数据。为支持高效的数据存储与检索,专门的时间序列数据库与系统应运而生。这些数据库支持以受限的结构化查询语言(SQL)格式对时间序列进行索引与查询,例如"月价格收益率超过5%的股票",并以严格格式表达。然而,此类查询无法捕捉高维时间序列数据的内在复杂性——这类数据往往更适合通过图像或语言描述(如"处于低波动率状态的股票")。此外,在时间序列空间中搜索所需的存储容量、计算时间及检索复杂度通常不容忽视。本文提出并展示了一种框架,利用深度编码器将金融时间序列多模态数据存储至低维潜在空间,使潜在空间投影不仅能捕捉时间序列趋势,还能保留金融时间序列数据的其他有效信息或属性(如价格波动率)。同时,本方法支持用户友好的查询接口,允许使用自然语言文本或时间序列草图进行查询——为此我们开发了直观的交互界面。通过真实历史数据与合成数据的实验,我们验证了该方法在计算效率与准确性方面的优势,并凸显了潜在空间投影在支持直观查询模态的金融时间序列数据存储与检索中的实用价值。