Code comments are a key resource for information about software artefacts. Depending on the use case, only some types of comments are useful. Thus, automatic approaches to classify these comments are proposed. In this work, we address this need by proposing, STACC, a set of SentenceTransformers-based binary classifiers. These lightweight classifiers are trained and tested on the NLBSE Code Comment Classification tool competition dataset, and surpass the baseline by a significant margin, achieving an average F1 score of 0.74 against the baseline of 0.31, which is an improvement of 139%. A replication package, as well as the models themselves, are publicly available.
翻译:代码注释是了解软件制品信息的关键资源。根据不同的应用场景,仅有部分类型的注释具有使用价值。因此,研究者提出了自动分类注释的方法。本文针对这一需求,提出了一组基于SentenceTransformers的二分类器——STACC。这些轻量级分类器在NLBSE代码注释分类工具竞赛数据集上进行训练与测试,其性能显著超越基线模型,平均F1分数达到0.74(相较基线模型的0.31提升了139%)。我们提供了可复现的实验包及模型本身,均向公众开放。