Background: Construct validity concerns the use of indicators to measure a concept that is not directly measurable. Aim: This study intends to identify, categorize, assess and quantify discussions of threats to construct validity in empirical software engineering literature and use the findings to suggest ways to improve the reporting of construct validity issues. Method: We analyzed 83 articles that report human-centric experiments published in five top-tier software engineering journals from 2015 to 2019. The articles' text concerning threats to construct validity was divided into segments (the unit of analysis) based on predefined categories. The segments were then evaluated regarding whether they clearly discussed a threat and a construct. Results: Three-fifths of the segments were associated with topics not related to construct validity. Two-thirds of the articles discussed construct validity without using the definition of construct validity given in the article. The threats were clearly described in more than four-fifths of the segments, but the construct in question was clearly described in only two-thirds of the segments. The construct was unclear when the discussion was not related to construct validity but to other types of validity. Conclusions: The results show potential for improving the understanding of construct validity in software engineering. Recommendations addressing the identified weaknesses are given to improve the awareness and reporting of CV.
翻译:背景:构念效度涉及使用指标来测量无法直接测量的概念。目的:本研究旨在识别、分类、评估并量化实证软件工程文献中对构念效度威胁的讨论,并利用研究结果提出改进构念效度问题报告的方法。方法:我们分析了2015年至2019年间发表在五种顶级软件工程期刊上的83篇以人为中心的实验文章。根据预定义类别,将文章关于构念效度威胁的文本划分为段落(分析单元)。随后评估这些段落是否清晰讨论了威胁和构念。结果:五分之三的段落涉及与构念效度无关的主题。三分之二的文章在讨论构念效度时未采用文中给出的构念效度定义。超过五分之四的段落清晰描述了威胁,但仅三分之二的段落清晰描述了所讨论的构念。当讨论与构念效度无关而涉及其他效度类型时,构念表述不清晰。结论:结果表明在软件工程中提升对构念效度的理解存在改进空间。针对已识别的弱点提出了建议,以提高对构念效度的认识和报告质量。