The Java Stream API aims at increasing developer productivity thanks to an easy-to-read declarative syntax to express computations. It also simplifies parallel computing, providing a high-level abstraction on top of common parallelization aspects. Unfortunately, there is a lack of benchmarks specifically targeting stream-based applications. Such a lack of benchmarks makes it difficult for researchers and developers of the Java class library to optimize the Stream API. Moreover, in the absence of dedicated benchmarks, it is difficult to analyze the performance of streams to suggest developers how to write efficient code using the API. In this work we present JEDI, a benchmark suite that targets the Stream API. JEDI is automatically generated by converting SQL benchmarks into Java benchmarks. Our code generator supports targets different implementations (both stream-based and imperative) for the same query. The ultimate goal of our benchmark suite -- and the main contribution of this work -- is to analyze the performance of the different implementations to spot inefficient code structures and better alternatives, suggesting best practices to Java developers. Among the multiple implementations we generate, we focus on different parallelization strategies and explain the most efficient parallelization strategies based on characteristics of the processed data. Finally, the code generation producing imperative code defines of a baseline that can guide researchers and Java implementers to optimize the Stream API.
翻译:摘要:Java Stream API 旨在通过易读的声明式语法表达计算,从而提高开发效率。它通过提供高层次抽象简化并行计算,隐藏了常见的并行化细节。然而,目前缺乏专门针对流式应用的基准测试。这种基准测试的缺失使得Java类库的研究者和开发者难以优化Stream API。此外,在没有专用基准测试的情况下,分析流的性能并为开发者提供如何使用该API编写高效代码的指导也面临困难。本文提出了JEDI——一个针对Stream API的基准测试集。JEDI通过将SQL基准测试自动转换为Java基准测试生成。我们的代码生成器支持为同一查询生成不同的实现(包括基于流的和命令式的)。该基准测试集的最终目标——也是本研究的主要贡献——是分析不同实现的性能,以识别低效代码结构及其更优替代方案,从而向Java开发者提出最佳实践建议。在生成的多种实现中,我们重点研究了不同的并行化策略,并根据处理数据的特征解释了最高效的并行化策略。最后,生成命令式代码的代码生成流程定义了基线,可指导研究者和Java实现者优化Stream API。