Software debloating tools seek to improve the program security and performance by removing unnecessary code, called bloat. While many techniques have been proposed, several barriers to their adoption have emerged. Namely, debloating tools are highly specialized, making it difficult for adopters to find the right type of tool for their needs. This is further hindered by a lack of established metrics and comparative evaluations between tools. To close this gap, we surveyed of 10 years of debloating literature and several tools currently under commercial development to systematize the debloating ecosystem's knowledge. We then conducted a broad comparative evaluation of 10 debloating tools to determine their relative strengths and weaknesses. Our evaluation, conducted on a diverse set of 20 benchmark programs, measures tools across 16 performance, security, correctness, and usability metrics. Our evaluation surfaces several concerning findings that contradict the prevailing narrative in debloating literature. First, debloating tools lack the required maturity to be used on real-world software, evidenced by a slim 21% overall success rate for creating passable debloated versions of medium- and high-complexity benchmarks. Second, debloating tools struggle to produce sound and robust programs. Using our novel differential fuzzing tool, DIFFER, we discovered that only 13% of our debloating attempts produced a sound and robust debloated program. Finally, our results indicate that debloating tools typically do not improve the performance or security posture of debloated programs by a significant degree. We believe that our contributions in this paper will help potential adopters better understand the landscape of tools and will motivate future research and development of more capable debloating tools. To this end, we have made our benchmark set, data, and custom tools publicly available.
翻译:软件精简工具旨在通过移除称为“臃肿代码”的非必要代码来提升程序安全性与性能。尽管已有多种技术被提出,但其应用仍面临若干障碍。具体而言,精简工具高度专业化,导致用户难以根据需求选择合适的工具类型。加之缺乏成熟的评估指标与工具间的比较评估,这一问题愈发严峻。为填补这一空白,我们对过去十年间的精简技术文献及目前处于商业开发阶段的若干工具进行了系统性梳理,以构建精简生态系统的知识体系。随后,我们对10款精简工具开展了广泛比较评估,以确定其相对优势与不足。该评估基于包含20个基准程序的多样化测试集,从16项性能、安全性、正确性及可用性指标维度对工具进行了综合度量。评估结果揭示了若干与现有精简文献主流观点相悖的发现:首先,精简工具缺乏应用于真实世界软件所需的技术成熟度——仅21%的中高复杂度基准程序能够成功生成可用的精简版本;其次,精简工具在生成可靠且健壮的程序方面存在困难——通过我们自主研发的差分模糊测试工具DIFFER检测发现,仅13%的精简尝试生成了可靠且健壮的程序;最后,实验结果表明,精简工具通常未能显著提升被精简程序的性能或安全态势。我们相信,本文的贡献将帮助潜在用户更深入地理解现有工具格局,并推动未来更具能力的精简工具的研究与开发。为此,我们已公开了基准测试集、实验数据及自主研发工具。