Protect Your Secrets: Understanding and Measuring Data Exposure in VSCode Extensions

Recent years have witnessed the emerging trend of extensions in modern Integrated Development Environments (IDEs) like Visual Studio Code (VSCode) that significantly enhance developer productivity. Especially, popular AI coding assistants like GitHub Copilot and Tabnine provide conveniences like automated code completion and debugging. While these extensions offer numerous benefits, they may introduce privacy and security concerns to software developers. However, there is no existing work that systematically analyzes the security and privacy concerns, including the risks of data exposure in VSCode extensions. In this paper, we investigate on the security issues of cross-extension interactions in VSCode and shed light on the vulnerabilities caused by data exposure among different extensions. Our study uncovers high-impact security flaws that could allow adversaries to stealthily acquire or manipulate credential-related data (e.g., passwords, API keys, access tokens) from other extensions if not properly handled by extension vendors. To measure their prevalence, we design a novel automated risk detection framework that leverages program analysis and natural language processing techniques to automatically identify potential risks in VSCode extensions. By applying our tool to 27,261 real-world VSCode extensions, we discover that 8.5\% of them (i.e., 2,325 extensions) are exposed to credential-related data leakage through various vectors, such as commands, user input, and configurations. Our study sheds light on the security challenges and flaws of the extension-in-IDE paradigm and provides suggestions and recommendations for improving the security of VSCode extensions and mitigating the risks of data exposure.

翻译：近年来，以Visual Studio Code（VSCode）为代表的现代集成开发环境（IDE）中扩展生态蓬勃发展，显著提升了开发者的工作效率。特别是诸如GitHub Copilot和Tabnine等流行AI编程助手，提供了自动代码补全和调试等便利功能。尽管这些扩展带来了诸多益处，它们也可能给软件开发人员引入隐私与安全隐患。然而，目前尚无系统性的研究工作分析VSCode扩展中的安全与隐私问题，包括数据暴露风险。本文深入探究了VSCode中跨扩展交互的安全问题，揭示了不同扩展间因数据暴露而引发的安全漏洞。我们的研究发现了一系列高危安全缺陷：若扩展供应商未妥善处理，攻击者可能通过跨扩展交互悄无声息地窃取或篡改其他扩展中的凭证相关数据（如密码、API密钥、访问令牌）。为评估此类风险的普遍性，我们设计了一种创新的自动化风险检测框架，该框架结合程序分析与自然语言处理技术，能够自动识别VSCode扩展中的潜在风险。通过将我们的工具应用于27,261个真实世界的VSCode扩展，我们发现其中8.5%（即2,325个扩展）存在通过命令、用户输入和配置等多种途径暴露凭证相关数据的风险。本研究揭示了IDE扩展范式的安全挑战与缺陷，并为提升VSCode扩展的安全性、降低数据暴露风险提供了建议与改进方向。

相关内容