Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies attempt to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.
翻译:修复软件缺陷和添加新功能是两项主要的维护任务。软件缺陷与功能需求以变更请求的形式提交。开发人员查阅这些请求后,通常从中选取若干关键词作为临时查询语句,随后借助搜索引擎执行查询,以定位软件代码中需要修改的具体位置。然而,即使是经验丰富的开发人员也常难以选择合适的查询,导致代码搜索过程中代价高昂的试错。多年来,大量研究尝试对开发人员的临时查询进行重写以提供支持。本系统文献综述从2970项候选研究中严格筛选出70项关于查询重写的核心研究,通过深入定性分析(如扎根理论)回答了七个研究问题并得出主要发现。首先,目前已有八种主要方法(如术语加权、术语共现分析、同义词表查找)被用于查询重写。其次,现有研究存在若干重大局限性(如缺乏泛化能力、词汇匹配问题、主观偏差),可能阻碍其广泛应用。最后,我们探讨了推动查询重写研究领域发展的最佳实践与未来机遇。