GEML: A Grammar-based Evolutionary Machine Learning Approach for Design-Pattern Detection

Design patterns (DPs) are recognised as a good practice in software development. However, the lack of appropriate documentation often hampers traceability, and their benefits are blurred among thousands of lines of code. Automatic methods for DP detection have become relevant but are usually based on the rigid analysis of either software metrics or specific properties of the source code. We propose GEML, a novel detection approach based on evolutionary machine learning using software properties of diverse nature. Firstly, GEML makes use of an evolutionary algorithm to extract those characteristics that better describe the DP, formulated in terms of human-readable rules, whose syntax is conformant with a context-free grammar. Secondly, a rule-based classifier is built to predict whether new code contains a hidden DP implementation. GEML has been validated over five DPs taken from a public repository recurrently adopted by machine learning studies. Then, we increase this number up to 15 diverse DPs, showing its effectiveness and robustness in terms of detection capability. An initial parameter study served to tune a parameter setup whose performance guarantees the general applicability of this approach without the need to adjust complex parameters to a specific pattern. Finally, a demonstration tool is also provided.

翻译：设计模式（DPs）被认为是软件开发中的良好实践。然而，缺乏适当的文档常常阻碍可追溯性，其优势在数千行代码中变得模糊不清。用于DP检测的自动化方法已变得相关，但通常基于对软件度量或源代码特定属性的僵化分析。我们提出GEML，一种基于进化机器学习的创新检测方法，利用多种性质的软件属性。首先，GEML采用进化算法提取最能描述DP的特征，以人类可读的规则形式表述，其语法符合上下文无关文法。其次，构建基于规则的分类器，用于预测新代码是否包含隐藏的DP实现。GEML已在来自机器学习研究频繁使用的公共存储库中的五个DP上得到验证。随后，我们将这一数量增加到15种不同的DP，展示了其在检测能力方面的有效性和鲁棒性。初步参数研究用于调整参数设置，其性能保证了该方法的通用适用性，无需为特定模式调整复杂参数。最后，还提供了一个演示工具。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日