The accuracy reported for code smell-detecting tools varies depending on the dataset used to evaluate the tools. Our survey of 45 existing datasets reveals that the adequacy of a dataset for detecting smells highly depends on relevant properties such as the size, severity level, project types, number of each type of smell, number of smells, and the ratio of smelly to non-smelly samples in the dataset. Most existing datasets support God Class, Long Method, and Feature Envy while six smells in Fowler and Beck's catalog are not supported by any datasets. We conclude that existing datasets suffer from imbalanced samples, lack of supporting severity level, and restriction to Java language.
翻译:针对代码异味检测工具所报告的性能准确度,会因评估时使用的数据集不同而产生差异。通过对45个现有数据集的调研发现,数据集对异味检测的适用性高度依赖于其相关属性,例如样本规模、严重等级、项目类型、各类异味数量、异味总量以及有异味样本与无异味样本的比例。现有数据集主要支持上帝类、长方法及特征依恋等异味类型,但Fowler与Beck目录中列出的六种异味尚无数据集予以支持。我们得出结论:现有数据集存在样本不平衡、缺乏严重等级标注以及仅限Java语言等问题。