The Java Stream API, introduced in Java 8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures.
翻译:Java 8 引入的 Java Stream API 相较于命令式循环,使数据处理更具表现力和简洁性。然而,这种抽象可能带来显著的性能开销,通常源于管道执行期间创建的多个中间对象。在 Haskell 等函数式语言中,此问题通过流融合(一种消除不必要中间结构的编译时优化)得以解决。受此思想启发,Streamliner 成为首个通过将流管道展开为命令式循环来对 Java 进行提前、字节码到字节码流优化的工具。本文介绍了一种采用不同方法的开源优化器:它并非将流展开为循环,而是将连续的 map() 和 filter() 操作合并为单个 mapMulti() 调用(该功能自 Java 16 起可用)。我们的方法避免了 Streamliner 的若干局限性,包括其对 lambda 表达式中转义对象的敏感性,以及对将流分配或传递为变量的限制。我们在九个基准测试上评估了该优化器,观察到两个案例中性能显著提升,其余多数案例结果相当。我们还将其应用于 Apache Kafka 的字节码,成功执行了所有 31,799 个单元测试且无失败。