The growing demand for database systems capable of efficiently managing massive datasets while delivering real-time transaction processing and advanced analytical capabilities has become critical in modern data infrastructure. While traditional OLAP systems often fail to meet these dual requirements, emerging real-time analytical processing systems still face persistent challenges, such as excessive data redundancy, complex cross-system synchronization, and suboptimal temporal efficiency. This paper introduces OceanBase Mercury as an innovative OLAP system designed for petabyte-scale data. The system features a distributed, multi-tenant architecture that ensures essential enterprise-grade requirements, including continuous availability and elastic scalability. Our technical contributions include three key components: (1) an adaptive columnar storage format with hybrid data layout optimization, (2) a differential refresh mechanism for materialized views with temporal consistency guarantees, and (3) a polymorphic vectorization engine supporting three distinct data formats. Empirical evaluations under real-world workloads demonstrate that OceanBase Mercury outperforms specialized OLAP engines by 1.3X to 3.1X speedup in query latency while maintaining sub-second latency, positioning it as a groundbreaking AP solution that effectively balances analytical depth with operational agility in big data environments.
翻译:现代数据基础设施中,对能够高效管理海量数据集、同时提供实时事务处理与高级分析能力的数据库系统需求日益增长,这已成为关键性要求。传统OLAP系统通常难以满足这种双重需求,而新兴的实时分析处理系统仍面临持续存在的挑战,例如数据冗余过度、跨系统同步复杂以及时间效率欠佳。本文提出OceanBase Mercury作为一种创新的面向PB级数据的OLAP系统。该系统采用分布式多租户架构,确保满足包括持续可用性和弹性可扩展性在内的企业级核心需求。我们的技术贡献包含三个关键组成部分:(1) 采用混合数据布局优化的自适应列式存储格式,(2) 具备时间一致性保证的物化视图差分刷新机制,以及(3) 支持三种不同数据格式的多态向量化引擎。在实际工作负载下的实证评估表明,OceanBase Mercury在查询延迟方面比专用OLAP引擎快1.3倍至3.1倍,同时保持亚秒级延迟,这使其成为一种突破性的AP解决方案,能够在大数据环境中有效平衡分析深度与运营敏捷性。