Design patterns (DPs) are recognised as a good practice in software development. However, the lack of appropriate documentation often hampers traceability, and their benefits are blurred among thousands of lines of code. Automatic methods for DP detection have become relevant but are usually based on the rigid analysis of either software metrics or specific properties of the source code. We propose GEML, a novel detection approach based on evolutionary machine learning using software properties of diverse nature. Firstly, GEML makes use of an evolutionary algorithm to extract those characteristics that better describe the DP, formulated in terms of human-readable rules, whose syntax is conformant with a context-free grammar. Secondly, a rule-based classifier is built to predict whether new code contains a hidden DP implementation. GEML has been validated over five DPs taken from a public repository recurrently adopted by machine learning studies. Then, we increase this number up to 15 diverse DPs, showing its effectiveness and robustness in terms of detection capability. An initial parameter study served to tune a parameter setup whose performance guarantees the general applicability of this approach without the need to adjust complex parameters to a specific pattern. Finally, a demonstration tool is also provided.
翻译:设计模式(DPs)被认为是软件开发中的良好实践。然而,缺乏适当的文档常常阻碍可追溯性,其优势在数千行代码中变得模糊不清。用于DP检测的自动化方法已变得相关,但通常基于对软件度量或源代码特定属性的僵化分析。我们提出GEML,一种基于进化机器学习的创新检测方法,利用多种性质的软件属性。首先,GEML采用进化算法提取最能描述DP的特征,以人类可读的规则形式表述,其语法符合上下文无关文法。其次,构建基于规则的分类器,用于预测新代码是否包含隐藏的DP实现。GEML已在来自机器学习研究频繁使用的公共存储库中的五个DP上得到验证。随后,我们将这一数量增加到15种不同的DP,展示了其在检测能力方面的有效性和鲁棒性。初步参数研究用于调整参数设置,其性能保证了该方法的通用适用性,无需为特定模式调整复杂参数。最后,还提供了一个演示工具。