Earth observation (EO) data volumes are rapidly increasing. While cloud computing are now used for processing large EO datasets, the energy efficiency aspects of such a processing have received much less attention. This issue is notable given the increasing awareness of energy costs and carbon footprint in big data processing, particularly with increased attention on compute-intensive foundation models. In this paper we identify gaps in energy efficiency practices within cloud-based EO big data (EOBD) processing and propose several research directions for improvement. We first examine the current EOBD landscape, focus on the requirements that necessitate cloud-based processing and analyze existing cloud-based EOBD solutions. We then investigate energy efficiency strategies that have been successfully employed in well-studied big data domains. Through this analysis, we identify several critical gaps in existing EOBD processing platforms, which primarily focus on data accessibility and computational feasibility, instead of energy efficiency. These gaps include insufficient energy monitoring mechanisms, lack of energy awareness in data management, inadequate implementation of energy-aware resource allocation and lack of energy efficiency criteria on task scheduling. Based on these findings, we propose the development of energy-aware performance monitoring and benchmarking frameworks, the use of optimization techniques for infrastructure orchestration, and of energy-efficient task scheduling approaches for distributed cloud-based EOBD processing frameworks. These proposed approaches aim to foster more energy awareness in EOBD processing , potentially reducing power consumption and environmental impact while maintaining or minimally impacting processing performance.
翻译:地球观测(EO)数据量正在迅速增长。虽然云计算现已用于处理大型EO数据集,但此类处理的能效方面却很少受到关注。鉴于大数据处理(尤其是日益受到关注的计算密集型基础模型)中的能源成本和碳足迹问题日益受到重视,这一问题尤为显著。本文识别了基于云的EO大数据(EOBD)处理中能效实践方面的差距,并提出了若干改进的研究方向。我们首先审视了当前EOBD的格局,重点关注需要基于云处理的需求,并分析了现有的基于云的EOBD解决方案。随后,我们研究了在已得到充分研究的大数据领域中已成功应用的能效策略。通过此分析,我们发现了现有EOBD处理平台中的若干关键差距,这些平台主要关注数据可访问性和计算可行性,而非能效。这些差距包括:能效监测机制不足、数据管理缺乏能效意识、能效感知的资源分配实施不充分,以及任务调度缺乏能效标准。基于这些发现,我们建议开发能效感知的性能监测与基准测试框架,利用优化技术进行基础设施编排,并为分布式云基EOBD处理框架采用能效感知的任务调度方法。这些建议的方法旨在提升EOBD处理中的能效意识,在保持或最小化影响处理性能的同时,潜在地降低能耗与环境影响。