The Stream API was added in Java 8 to allow the declarative expression of data-processing logic, typically map-reduce-like data transformations on collections and datasets. The Stream API introduces two key abstractions. The stream, which is a sequence of elements available in a data source, and the stream pipeline, which contains operations (e.g., map, filter, reduce) that are applied to the elements in the stream upon execution. Streams are getting popular among Java developers as they leverage the conciseness of functional programming and ease the parallelization of data processing. Despite the benefits of streams, in comparison to data processing relying on imperative code, streams can introduce significant overheads which are mainly caused by extra object allocations and reclamations, and the use of virtual method calls. As a result, developers need means to study the runtime behavior of streams in the goal of both mitigating such abstraction overheads and optimizing stream processing. Unfortunately, there is a lack of dedicated tools able to dynamically analyze streams to help developers specifically locate issues degrading application performance. In this paper, we address the profiling and optimization of streams. We present a novel profiling technique for measuring the computations performed by a stream in terms of elapsed reference cycles, which we use to locate problematic streams with a major impact on application performance. While accuracy is crucial to this end, the inserted instrumentation code causes the execution of extra cycles, which are partially included in the profiles. To mitigate this issue, we estimate and compensate for the extra cycles caused by the inserted instrumentation code. We implement our approach in StreamProf that, to the best of our knowledge, is the first dedicated stream profiler for the Java Virtual Machine (JVM). With StreamProf, we find that cycle profiling is effective to detect problematic streams whose optimization can enable significant performance gains. We also find that the accurate profiling of tasks supporting parallel stream processing allows the diagnosis of load imbalance according to the distribution of stream-related cycles at a thread level. We conduct an evaluation on sequential and parallel stream-based workloads that are publicly available in three different sources. The evaluation shows that our profiling technique is efficient and yields accurate profiles. Moreover, we show the actionability of our profiles by guiding stream-related optimizations on two workloads from Renaissance. Our optimizations require the modification of only a few lines of code while achieving speedups up to a factor of 5x. Java streams have been extensively studied by recent work, focusing on both how developers are using streams and how to optimize them. Current approaches in the optimization of streams mainly rely on static analysis techniques that overlook runtime information, suffer from important limitations to detect all streams executed by a Java application, or are not suitable for the analysis of parallel streams. Understanding the dynamic behavior of both sequential and parallel stream processing and its impact on application performance is crucial to help users make better decisions while using streams.
翻译:Stream API 于Java 8中引入,旨在支持以声明式方式表达数据处理逻辑,典型应用包括对集合和数据集的映射-归约式转换。Stream API引入了两个核心抽象:流(即数据源中可获取的元素序列)与流管线(包含执行时应用于流中各元素的操作,如map、filter、reduce)。由于兼具函数式编程的简洁性与数据处理并行化的便利性,流在Java开发者中日益普及。然而,与依赖命令式代码的数据处理相比,流会引入显著开销,主要源于额外的对象分配与回收以及虚方法调用。因此,开发者需要研究流的运行时行为,以缓解此类抽象开销并优化流处理性能。遗憾的是,目前缺乏专用工具来动态分析流,以帮助开发者精准定位影响应用性能的问题。本文聚焦于流式编程的性能剖析与优化。我们提出一种新型性能剖析技术,以参考循环周期为单位测量流的计算量,据此定位对应用性能产生重大影响的异常流。虽然准确性对此至关重要,但插入的插装代码会引发额外的循环周期,这些周期会部分被纳入性能剖析结果。为缓解此问题,我们估算并补偿插装代码导致的额外周期开销。我们将该方法实现于StreamProf工具,据我们所知,这是首个专为Java虚拟机(JVM)设计的流剖析器。借助StreamProf,我们发现循环周期剖析能有效检测出那些经优化后可显著提升性能的异常流。我们还发现,对支持并行流处理的任务进行精确剖析,可根据线程级流相关循环周期的分布情况诊断负载不均衡问题。我们在三个不同来源公开的串行与并行流工作负载上进行了评估。结果表明,我们的剖析技术高效且能生成准确的性能画像。此外,我们通过指导Renaissance基准中两个工作负载的流相关优化,证明了剖析结果的可操作性。这些优化仅需修改少量代码,即可实现最高5倍的加速比。近期研究已对Java流展开广泛探讨,重点涵盖开发者的流使用模式与优化策略。当前的流优化方法主要依赖静态分析技术,这些技术忽视运行时信息、难以检测Java应用执行的全部流,或不适于分析并行流。理解串行与并行流处理的动态行为及其对应用性能的影响,对于帮助用户在使用流时做出更优决策至关重要。