Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access bugs collected from seven large-scale Java open source applications that use relational database management systems (e.g., MySQL or PostgreSQL). We study the characteristics (e.g., occurrence and root causes) of the bugs by manually examining the bug reports and commit histories. We find that the number of reported database and non-database access bugs share a similar trend but their modified files in bug fixing commits are different. Additionally, we generalize categories of the root causes of database access bugs, containing five main categories (SQL queries, Schema, API, Configuration, SQL query result) and 25 unique root causes. We find that the bugs pertaining to SQL queries, Schema, and API cover 84.2% of database access bugs across all studied applications. In particular, SQL queries bug (54%) and API bug (38.7%) are the most frequent issues when using JDBC and Hibernate, respectively. Finally, we provide a discussion on the implications of our findings for developers and researchers.
翻译:数据库驱动的应用程序依赖数据库访问代码与底层数据库管理系统进行交互。尽管已有许多研究关注SQL反模式或SQL代码异味等数据库访问问题,但针对数据库驱动应用维护过程中数据库访问缺陷的研究仍较为缺乏。本文通过实证方法研究了从七个使用关系型数据库管理系统(如MySQL或PostgreSQL)的大型Java开源应用中收集的423个数据库访问缺陷。我们通过人工检查缺陷报告和提交历史,系统分析了这些缺陷的特征(如出现位置和根本原因)。研究发现,报告的数据库访问缺陷与非数据库访问缺陷的数量变化趋势相似,但修复提交中修改的文件存在差异。此外,我们归纳了数据库访问缺陷根本原因的分类体系,包含五大类别(SQL查询、数据库模式、API、配置、SQL查询结果)及25种具体原因。在所有研究应用中,涉及SQL查询、数据库模式和API的缺陷占比达84.2%。特别值得注意的是,在使用JDBC时SQL查询缺陷(54%)最为常见,而在使用Hibernate时API缺陷(38.7%)出现频率最高。最后,本文就研究发现对开发者和研究人员的启示进行了讨论。