Context: Software programs can be written in different but functionally equivalent ways. Even though previous research has compared specific formatting elements to find out which alternatives affect code legibility, seeing the bigger picture of what makes code more or less legible is challenging. Goal: We aim to find which formatting elements have been investigated in empirical studies and which alternatives were found to be more legible for human subjects. Method: We conducted a systematic literature review and identified 15 papers containing human-centric studies that directly compared alternative formatting elements. We analyzed and organized these formatting elements using a card-sorting method. Results: We identified 13 formatting elements (e.g., indentation) and 33 levels of formatting elements (e.g., two-space indentation), which are about formatting styles, spacing, block delimiters, long or complex code lines, and word boundary styles. While some levels were found to be statistically better than other equivalent ones in terms of code legibility, e.g., appropriate use of indentation with blocks, others were not, e.g., formatting layout. For identifier style, we found divergent results, where one study found a significant difference in favor of camel case, while another study found a positive result in favor of snake case. Conclusion: The number of identified papers, some of which are outdated, and the many null and contradictory results emphasize the relative lack of work in this area and underline the importance of more research. There is much to be understood about how formatting elements influence code legibility before the creation of guidelines and automated aids to help developers make their code more legible.
翻译:背景:软件程序可以通过不同但功能等价的方式编写。尽管以往研究对比了特定格式化元素以探究哪些替代方案影响代码可读性,但整体把握代码可读性高低的影响因素仍具挑战性。目标:我们旨在找出实证研究中已考察的格式化元素,以及哪些替代方案经发现对人类受试者而言更具可读性。方法:我们通过系统性文献综述,识别出15篇包含直接对比不同格式化元素的人本研究的论文。采用卡片分类法对这些格式化元素进行分析与归类。结果:我们识别出13种格式化元素(如缩进)及33种格式化层级(如双空格缩进),涉及格式化风格、间距、块分隔符、长代码行或复杂代码行、以及单词边界样式。部分层级(如对代码块合理使用缩进)在统计上被证明比等价替代格式更具可读性,而其他层级(如格式化布局)则未发现显著差异。在标识符命名风格方面,研究结果存在分歧:一项研究发现驼峰命名法具有显著优势,而另一项研究则支持蛇形命名法更优。结论:已识别论文数量有限(部分已过时),加之大量零结果与矛盾结果,凸显该领域研究相对不足及其深入研究的必要性。在制定指导原则与自动化辅助工具以帮助开发者提高代码可读性之前,仍需深入理解格式化元素对代码可读性的具体影响机制。