This study proposes a novel storage engine, SynchroStore, designed to address the inefficiency of update operations in columnar storage systems based on Log-Structured Merge Trees (LSM-Trees) under hybrid workload scenarios. While columnar storage formats demonstrate significant query performance advantages when handling large-scale datasets, traditional columnar storage systems face challenges such as high update complexity and poor real-time performance in data-intensive applications. SynchroStore introduces an incremental row storage mechanism and a fine-grained row-to-column transformation and compaction strategy, effectively balancing data update efficiency and query performance. The storage system employs an in-memory row storage structure to support efficient update operations, and the data is converted to a columnar format after freezing to support high-performance read operations. The core innovations of SynchroStore are reflected in the following aspects:(1) the organic combination of incremental row storage and columnar storage; (2) a fine-grained row-to-column transformation and compaction mechanism; (3) a cost-based scheduling strategy. These innovative features allow SynchroStore to leverage background computational resources for row-to-column transformation and compaction operations, while ensuring query performance is unaffected, thus effectively solving the update performance bottleneck of columnar storage under hybrid workloads. Experimental evaluation results show that, compared to existing columnar storage systems like DuckDB, SynchroStore exhibits significant advantages in update performance under hybrid workloads.
翻译:本研究提出了一种新型存储引擎SynchroStore,旨在解决基于日志结构合并树(LSM-Trees)的列式存储系统在混合负载场景下更新操作效率低下的问题。列式存储格式在处理大规模数据集时展现出显著的查询性能优势,但传统列式存储系统在数据密集型应用中面临更新复杂度高、实时性差等挑战。SynchroStore引入了增量行存储机制以及细粒度的行列转换与合并策略,有效平衡了数据更新效率与查询性能。该存储系统采用内存行存储结构以支持高效的更新操作,数据冻结后转换为列式格式以支持高性能读取操作。SynchroStore的核心创新体现在以下几个方面:(1)增量行存储与列式存储的有机结合;(2)细粒度的行列转换与合并机制;(3)基于代价的调度策略。这些创新特性使得SynchroStore能够利用后台计算资源进行行列转换与合并操作,同时确保查询性能不受影响,从而有效解决了混合负载下列式存储的更新性能瓶颈。实验评估结果表明,与DuckDB等现有列式存储系统相比,SynchroStore在混合负载下的更新性能具有显著优势。