The R ecosystem offers a rich variety of map-reduce application programming interfaces (APIs) for iterative computations, yet parallelizing code across these diverse frameworks requires learning multiple, often incompatible, parallel APIs. The futurize package addresses this challenge by providing a single function, futurize(), which transpiles sequential map-reduce expressions into their parallel equivalents in the future ecosystem, which performs all the heavy lifting. By leveraging R's native pipe operator, users can parallelize existing code with minimal refactoring -- often by simply appending `|> futurize()' to an expression. The package supports classical map-reduce functions from base R, purrr, crossmap, foreach, plyr, BiocParallel, e.g., lapply(xs, fcn) |> futurize() and map(xs, fcn) |> futurize(), as well as a growing set of domain-specific packages, e.g., boot, caret, glmnet, lme4, mgcv, and tm. By abstracting away the underlying parallel machinery, and unifying handling of future options, the package enables developers to declare what to parallelize via futurize(), and end-users to choose how via plan(). This article describes the philosophy, design, and implementation of futurize, demonstrates its usage across various map-reduce paradigms, and discusses its role in simplifying parallel computing in R.
翻译:R生态系统为迭代计算提供了丰富的map-reduce应用程序编程接口(API),然而在这些多样化框架间实现代码并行化需要学习多种通常互不兼容的并行API。futurize包通过提供单一函数futurize()来解决这一挑战,该函数将顺序执行的map-reduce表达式转译为future生态系统中的并行等效代码,由该生态系统承担所有繁重的并行处理任务。通过利用R的原生管道运算符,用户能够以最小的代码重构代价实现现有代码的并行化——通常只需在表达式末尾附加`|> futurize()`。该包支持来自基础R、purrr、crossmap、foreach、plyr、BiocParallel的经典map-reduce函数,例如lapply(xs, fcn) |> futurize()和map(xs, fcn) |> futurize(),同时支持日益增长的领域专用包,例如boot、caret、glmnet、lme4、mgcv和tm。通过抽象底层的并行机制并统一处理future选项,该包使开发者能够通过futurize()声明需要并行化的内容,而终端用户则可通过plan()选择并行化方式。本文阐述了futurize的设计理念、架构与实现,演示了其在各种map-reduce范式中的应用,并探讨了其在简化R语言并行计算中的作用。