Since Gartner coined the term, Hybrid Transactional and Analytical Processing (HTAP), numerous HTAP databases have been proposed to combine transactions with analytics in order to enable real-time data analytics for various data-intensive applications. HTAP databases typically process the mixed workloads of transactions and analytical queries in a unified system by leveraging both a row store and a column store. As there are different storage architectures and processing techniques to satisfy various requirements of diverse applications, it is critical to summarize the pros and cons of these key techniques. This paper offers a comprehensive survey of HTAP databases. We mainly classify state-of-the-art HTAP databases according to four storage architectures: (a) Primary Row Store and In-Memory Column Store; (b) Distributed Row Store and Column Store Replica; (c) Primary Row Store and Distributed In-Memory Column Store; and (d) Primary Column Store and Delta Row Store. We then review the key techniques in HTAP databases, including hybrid workload processing, data organization, data synchronization, query optimization, and resource scheduling. We also discuss existing HTAP benchmarks. Finally, we provide the research challenges and opportunities for HTAP techniques.
翻译:自Gartner提出混合事务与分析处理(HTAP)概念以来,众多HTAP数据库被提出,旨在将事务与分析功能相结合,为各类数据密集型应用提供实时数据分析能力。HTAP数据库通常通过结合行存储与列存储,在统一系统中处理事务与分析查询的混合工作负载。由于不同存储架构和处理技术需满足多样化应用需求,总结这些关键技术优缺点的必要性日益凸显。本文对HTAP数据库进行了全面综述。我们主要根据四种存储架构对现有HTAP数据库进行分类:(a)主行存储与内存列存储;(b)分布式行存储与列存储副本;(c)主行存储与分布式内存列存储;(d)主列存储与增量行存储。随后,我们审视了HTAP数据库的关键技术,包括混合工作负载处理、数据组织、数据同步、查询优化和资源调度。我们还讨论了现有的HTAP基准测试。最后,提出了HTAP技术的研究挑战与机遇。