Applications are increasingly written as dynamic workflows underpinned by an execution framework that manages asynchronous computations across distributed hardware. However, execution frameworks typically offer one-size-fits-all solutions for data flow management, which can restrict performance and scalability. ProxyStore, a middleware layer that optimizes data flow via an advanced pass-by-reference paradigm, has shown to be an effective mechanism for addressing these limitations. Here, we investigate integrating ProxyStore with Dask Distributed, one of the most popular libraries for distributed computing in Python, with the goal of supporting scalable and portable scientific workflows. Dask provides an easy-to-use and flexible framework, but is less optimized for scaling certain data-intensive workflows. We investigate these limitations and detail the technical contributions necessary to develop a robust solution for distributed applications and demonstrate improved performance on synthetic benchmarks and real applications.
翻译:应用程序日益以动态工作流的形式编写,其底层由执行框架支撑,用于管理跨分布式硬件的异步计算。然而,执行框架通常为数据流管理提供“一刀切”的解决方案,这可能会限制性能和可扩展性。ProxyStore 作为一种通过高级按引用传递范式优化数据流的中间件层,已被证明是解决这些限制的有效机制。本文研究了将 ProxyStore 与 Dask Distributed(Python 中最流行的分布式计算库之一)集成的方案,旨在支持可扩展且可移植的科学工作流。Dask 提供了一个易用且灵活的框架,但在扩展某些数据密集型工作流方面优化不足。我们探讨了这些限制,并详细阐述了开发适用于分布式应用的稳健解决方案所需的技术贡献,同时在合成基准测试和实际应用中展示了性能提升。