Over time, GitHub has introduced different strategies for sharing reusable code artifacts. In addition to fork-based reuse, template repositories provide a distinct feature for generating new projects from scaffolding. Although this feature has been available since 2019, little is known about the domains it supports, its maintenance characteristics, or the practices that guide practitioners for effective template design. To address this gap, we conduct a large-scale empirical study of GitHub template repositories across the five most used programming languages. First, we mine and categorize templates to analyze the domains they serve, exploring the LLM-as-a-judge strategy. Next, we explore the reliability of templates by evaluating the associations between repository characteristics and activity, and quality-related issues (e.g., code smells, vulnerabilities, and security hotspots) through statistical analysis. Finally, we qualitatively analyze a representative subset of templates to derive practical guidelines and recurring pitfalls for template design and management. Our results show that Web Development is the predominant domain across ecosystems, while maintenance and quality issues vary by programming language. We further find that high-quality templates tend to adopt established software engineering practices, while providing comprehensive documentation and clear guidance for use. Overall, our findings offer empirical insights and actionable guidance to support practitioners in designing and adopting high-quality template repositories.
翻译:随着时间推移,GitHub引入了不同策略来共享可复用的代码工件。除了基于复刻(fork)的复用方式外,模板仓库(template repository)提供了一种从脚手架生成新项目的独特功能。尽管该功能自2019年便已可用,但学界对其所服务的领域、维护特性以及指导从业者进行有效模板设计的实践方法知之甚少。为弥补这一研究空白,我们对五种最常用编程语言的GitHub模板仓库进行了大规模实证研究。首先,我们挖掘并分类模板,以分析其服务的领域,并探索了"大语言模型作为评判者"策略。接着,通过统计分析评估仓库特征与活跃度、与质量相关问题(如代码坏味、漏洞及安全热点)之间的关联,进而探究模板的可靠性。最后,我们对具有代表性的模板子集进行定性分析,以提取模板设计与管理方面的实用指南及常见陷阱。研究结果表明,Web开发是各生态系统中的主导领域,而维护与质量问题则因编程语言而异。我们进一步发现,高质量模板倾向于采用成熟的软件工程实践,并提供全面的文档与清晰的使用指南。总体而言,我们的研究结果为从业者设计并采用高质量模板仓库提供了实证洞见与可操作指导。