Enterprises increasingly adopt multi cloud architectures to take advantage of diverse database engines, regional availability, and cost models. In these environments, ETL pipelines must process large, distributed datasets while minimizing latency and transfer cost. Push down optimization, which executes transformation logic within database engines rather than within the ETL tool, has proven highly effective in single cloud systems. However, when applied across multiple clouds, it faces challenges related to data movement, heterogeneous SQL engines, orchestration complexity, and fragmented security controls. This paper examines the feasibility of push down optimization in multi cloud ETL pipelines and analyzes its benefits and limitations. It evaluates localized push down, hybrid models, and data federation techniques that reduce cross cloud traffic while improving performance. A case study across Redshift and BigQuery demonstrates measurable gains, including lower end to end runtime, reduced transfer volume, and improved cost efficiency. The study highlights practical strategies that organizations can adopt to improve ETL scalability and reliability in distributed cloud environments.
翻译:企业日益采用多云架构以利用多样化的数据库引擎、区域可用性和成本模型。在这些环境中,ETL管道必须处理大规模分布式数据集,同时最小化延迟和传输成本。下推优化(即在数据库引擎内部而非ETL工具内部执行转换逻辑)已在单云系统中被证明极为有效。然而,当应用于跨多个云环境时,该技术面临数据移动、异构SQL引擎、编排复杂性和碎片化安全控制等挑战。本文探讨了多云ETL管道中下推优化的可行性,并分析了其优势与局限性。研究评估了局部下推、混合模型以及数据联邦技术,这些技术能在提升性能的同时减少跨云流量。通过Redshift和BigQuery的案例研究展示了可量化的收益,包括端到端运行时间缩短、传输数据量减少以及成本效率提升。本研究重点提出了组织可采用的实用策略,以提升分布式云环境中ETL的可扩展性与可靠性。