Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. To our knowledge, the challenges developers face to avoid checked-in secrets are not yet characterized. The goal of our paper is to aid researchers and tool developers in understanding and prioritizing opportunities for future research and tool automation for mitigating checked-in secrets through an empirical investigation of challenges and solutions related to checked-in secrets. We extract 779 questions related to checked-in secrets on Stack Exchange and apply qualitative analysis to determine the challenges and the solutions posed by others for each of the challenges. We identify 27 challenges and 13 solutions. The four most common challenges, in ranked order, are: (i) store/version of secrets during deployment; (ii) store/version of secrets in source code; (iii) ignore/hide of secrets in source code; and (iv) sanitize VCS history. The three most common solutions, in ranked order, are: (i) move secrets out of source code/version control and use template config file; (ii) secret management in deployment; and (iii) use local environment variables. Our findings indicate that the same solution has been mentioned to mitigate multiple challenges. However, our findings also identify an increasing trend in questions lacking accepted solutions substantiating the need for future research and tool automation on managing secrets.
翻译:2021年,GitGuardian对公开GitHub仓库的监测显示,暴露的密钥(数据库凭证、API密钥及其他凭据)数量相比2020年翻了一番,累计超过600万条。据我们所知,开发者为避免已提交密钥所面临的挑战尚未被系统描述。本文旨在通过实证研究已提交密钥的相关挑战与解决方案,帮助研究人员和工具开发者理解并优先规划未来研究方向及工具自动化,以缓解已提交密钥问题。我们从Stack Exchange中提取了779个与已提交密钥相关的问题,采用定性分析方法确定挑战类别及各挑战对应的已有解决方案,最终识别出27项挑战和13种解决方案。按频率排序,最常见的四项挑战为:(i)部署过程中密钥的存储/版本管理;(ii)源代码中密钥的存储/版本管理;(iii)源代码中密钥的忽略/隐藏;(iv)版本控制系统历史记录清理。最常见的三种解决方案依次为:(i)将密钥移出源代码/版本控制并采用模板配置文件;(ii)部署中的密钥管理;(iii)使用本地环境变量。研究表明,同一解决方案需应对多项挑战,同时缺乏公认解决方案的问题数量呈上升趋势,这进一步凸显了未来在密钥管理领域开展研究与开发自动化工具的必要性。