This paper explores a prevailing trend in the industry: migrating data-intensive analytics applications from on-premises to cloud-native environments. We find that the unique cost models associated with cloud-based storage necessitate a more nuanced understanding of optimizing performance. Specifically, based on traces collected from Uber's Presto fleet in production, we argue that common I/O optimizations, such as table scan and filter, and broadcast join, may lead to unexpected costs when naively applied in the cloud. This is because traditional I/O optimizations mainly focus on improving throughput or latency in on-premises settings, without taking into account the monetary costs associated with storage API calls. In cloud environments, these costs can be significant, potentially involving billions of API calls per day just for Presto workloads at Uber scale. Presented as a case study, this paper serves as a starting point for further research to design efficient I/O strategies specifically tailored for data-intensive applications in cloud settings.
翻译:本文探讨了行业中的一个主流趋势:将数据密集型分析应用从本地部署迁移至云原生环境。我们发现,云存储所特有的成本模型要求对性能优化具备更细致的理解。具体而言,基于优步生产环境中Presto集群的跟踪数据,我们论证了常见I/O优化(如表扫描与过滤、广播连接)在云环境中直接应用可能导致意外成本。这是因为传统I/O优化主要关注本地环境下的吞吐量或延迟提升,而未考虑存储API调用相关的货币成本。在云环境中,此类成本可能极为显著——仅优步规模的Presto工作负载,每日API调用量就可达数十亿次。本文以案例研究形式呈现,旨在为后续在云环境下设计专用于数据密集型应用的高效I/O策略提供研究起点。