Programming languages are essential tools for developers, and their evolution plays a crucial role in supporting the activities of developers. One instance of programming language evolution is the introduction of syntactic sugars, which are additional syntax elements that provide alternative, more readable code constructs. However, the process of designing and evolving a programming language has traditionally been guided by anecdotal experiences and intuition. Recent advances in tools and methodologies for mining open-source repositories have enabled developers to make data-driven software engineering decisions. In light of this, this paper proposes an approach for motivating data-driven programming evolution by applying frequent subgraph mining techniques to a large dataset of 166,827,154 open-source Java methods. The dataset is mined by generalizing Java control-flow graphs to capture broad programming language usages and instances of duplication. Frequent subgraphs are then extracted to identify potentially impactful opportunities for new syntactic sugars. Our diverse results demonstrate the benefits of the proposed technique by identifying new syntactic sugars involving a variety of programming constructs that could be implemented in Java, thus simplifying frequent code idioms. This approach can potentially provide valuable insights for Java language designers, and serve as a proof-of-concept for data-driven programming language design and evolution.
翻译:编程语言是开发者不可或缺的工具,其演进在支持开发者活动方面发挥着关键作用。编程语言演进的实例之一是引入语法糖——这些额外的语法元素提供了更易读的替代代码结构。然而,传统上编程语言的设计与演进过程一直依赖于经验轶事和直觉。近年来,用于挖掘开源仓库的工具与方法取得的进展,使开发者能够做出数据驱动的软件工程决策。鉴于此,本文提出了一种方法,通过将频繁子图挖掘技术应用于包含166,827,154个Java开源方法的大规模数据集,来推动数据驱动的编程语言演进。该数据集通过泛化Java控制流图来捕获广泛的编程语言使用模式和重复实例。随后提取频繁子图,以识别具有潜在影响的新语法糖机会。我们的多样化结果表明,所提出的技术通过识别涉及多种编程结构的新语法糖(这些语法糖可被实现到Java中,从而简化频繁出现的代码惯用模式)展现了其优势。该方法能为Java语言设计者提供宝贵见解,并作为数据驱动编程语言设计与演进的概念验证。