NLP-Guided Synthesis: Transitioning from Sequential Programs to Distributed Programs

As the need for large-scale data processing grows, distributed programming frameworks like PySpark have become increasingly popular. However, the task of converting traditional, sequential code to distributed code remains a significant hurdle, often requiring specialized knowledge and substantial time investment. While existing tools have made strides in automating this conversion, they often fall short in terms of speed, flexibility, and overall applicability. In this paper, we introduce ROOP, a groundbreaking tool designed to address these challenges. Utilizing a BERT-based Natural Language Processing (NLP) model, ROOP automates the translation of Python code to its PySpark equivalent, offering a streamlined solution for leveraging distributed computing resources. We evaluated ROOP using a diverse set of 14 Python programs comprising 26 loop fragments. Our results are promising: ROOP achieved a near-perfect translation accuracy rate, successfully converting 25 out of the 26 loop fragments. Notably, for simpler operations, ROOP demonstrated remarkable efficiency, completing translations in as little as 44 seconds. Moreover, ROOP incorporates a built-in testing mechanism to ensure the functional equivalence of the original and translated code, adding an extra layer of reliability. This research opens up new avenues for automating the transition from sequential to distributed programming, making the process more accessible and efficient for developers.

翻译：随着大规模数据处理需求的增长，PySpark等分布式编程框架日益普及。然而，将传统的顺序代码转换为分布式代码仍然是一个重大障碍，通常需要专业知识和大量时间投入。尽管现有工具在自动化转换方面取得了进展，但在速度、灵活性和整体适用性方面仍存在不足。本文介绍了ROOP，这是一种旨在应对这些挑战的开创性工具。ROOP利用基于BERT的自然语言处理（NLP）模型，自动将Python代码翻译为等效的PySpark代码，为利用分布式计算资源提供了简化的解决方案。我们使用包含26个循环片段的14个不同Python程序对ROOP进行了评估。结果令人鼓舞：ROOP实现了接近完美的翻译准确率，成功转换了26个循环片段中的25个。值得注意的是，对于简单操作，ROOP表现出显著的效率，最快可在44秒内完成翻译。此外，ROOP内置了测试机制，以确保原始代码与翻译后代码的功能等价性，从而增加了额外的可靠性。这项研究为自动化从顺序编程到分布式编程的转换开辟了新途径，使开发人员能够更便捷、高效地完成这一过程。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日