Applications are increasingly written as dynamic workflows underpinned by an execution framework that manages asynchronous computations across distributed hardware. However, execution frameworks typically offer one-size-fits-all solutions for data flow management, which can restrict performance and scalability. ProxyStore, a middleware layer that optimizes data flow via an advanced pass-by-reference paradigm, has shown to be an effective mechanism for addressing these limitations. Here, we investigate integrating ProxyStore with Dask Distributed, one of the most popular libraries for distributed computing in Python, with the goal of supporting scalable and portable scientific workflows. Dask provides an easy-to-use and flexible framework, but is less optimized for scaling certain data-intensive workflows. We investigate these limitations and detail the technical contributions necessary to develop a robust solution for distributed applications and demonstrate improved performance on synthetic benchmarks and real applications.
翻译:应用程序日益以动态工作流的形式构建,其底层由在分布式硬件上管理异步计算的执行框架支撑。然而,执行框架通常为数据流管理提供通用型解决方案,这可能限制性能与可扩展性。ProxyStore作为一种通过高级按引用传递范式优化数据流的中间件层,已被证明是应对这些局限性的有效机制。本文研究将ProxyStore与Dask Distributed(Python生态中最流行的分布式计算库之一)相集成,旨在支持可扩展且可移植的科学工作流。Dask提供了易用且灵活的框架,但在扩展某些数据密集型工作流方面优化不足。我们深入探究这些局限性,详细阐述开发分布式应用稳健解决方案所需的技术贡献,并通过合成基准测试与真实应用案例展示了性能提升效果。