With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these challenges, this paper introduces the Alluxio local (edge) cache, a highly effective architectural optimization tailored for such environments. This embeddable cache, optimized for petabyte-scale data analytics, leverages local SSD resources to alleviate network I/O and API call pressures, significantly improving data transfer efficiency. Integrated with OLAP systems like Presto and storage services like HDFS, the Alluxio local cache has demonstrated its effectiveness in handling large-scale, enterprise-grade workloads over three years of deployment at Uber and Meta. We share insights and operational experiences in implementing these optimizations, providing valuable perspectives on managing modern, massive-scale OLAP workloads.
翻译:随着数据的指数级增长和应用场景的不断演进,PB级OLAP数据平台日益采用计算与存储分离的架构模式。这种转变在Uber和Meta等企业的实践中已得到印证,同时也带来了新的运维挑战,包括:规模庞大且以读为主的I/O流量可能引发限流问题,以及倾斜与碎片化的数据访问模式。为应对这些挑战,本文提出Alluxio本地(边缘)缓存——一种专门为此类环境设计的高效架构优化方案。该可嵌入式缓存针对PB级数据分析场景优化,通过利用本地SSD资源来缓解网络I/O和API调用压力,显著提升数据传输效率。通过与Presto等OLAP系统及HDFS等存储服务集成,Alluxio本地缓存在Uber和Meta长达三年的实际部署中,已证实在处理企业级大规模工作负载方面的卓越效能。本文分享了实施这些优化方案的核心洞见与运维经验,为管理现代化超大规模OLAP工作负载提供了重要参考视角。