The existence of trace links between artifacts of the software development life cycle can improve the efficiency of many activities during software development, maintenance and operations. Unfortunately, the creation and maintenance of trace links is time-consuming and error-prone. Research efforts have been spent to automatically compute trace links and lately gained momentum, e.g., due to the availability of powerful tools in the area of natural language processing. In this paper, we report on some observations that we made during studying non-linear similarity measures for computing trace links. We argue, that taking a geometric viewpoint on semantic similarity can be helpful for future traceability research. We evaluated our observations on a dataset of four open source projects and two industrial projects. We furthermore point out that our findings are more general and can build the basis for other information retrieval problems as well.
翻译:软件开发生命周期中各工件之间迹线链接的存在,能够提升软件开发、维护及运维过程中诸多活动的效率。然而,迹线链接的创建与维护既耗时又易出错。近年来,学界致力于自动化迹线链接的计算,并因自然语言处理领域强大工具的可用性而取得显著进展。本文报告了我们在研究用于计算迹线链接的非线性相似度度量时的一些发现。我们认为,从几何角度审视语义相似性,可为未来的迹线链接研究提供启发。我们基于四个开源项目与两个工业项目的数据集对观察结果进行了评估。此外,我们指出这些发现具有更广泛的适用性,亦可为其他信息检索问题奠定基础。