Researchers have recently devised tools for debloating software and detecting configuration errors. Several of these tools rely on the observation that programs are composed of an initialization phase followed by a main-computation phase. Users of these tools are required to manually annotate the boundary that separates these phases, a task that can be time-consuming and error-prone (typically, the user has to read and understand the source code or trace executions with a debugger). Because errors can impair the tool's accuracy and functionality, the manual-annotation requirement hinders the ability to apply the tools on a large scale. In this paper, we present a field study of 24 widely-used C/C++ programs, identifying common boundary properties in 96\% of them. We then introduce \textit{slash}, an automated tool that locates the boundary based on the identified properties. \textit{slash} successfully identifies the boundary in 87.5\% of the studied programs within 8.5\ minutes, using up to 4.4\ GB memory. In an independent test, carried out after \textit{slash} was developed, \textit{slash} identified the boundary in 85.7\% of a dataset of 21 popular C/C++ GitHub repositories. Finally, we demonstrate \textit{slash}'s potential to streamline the boundary-identification process of software-debloating and error-detection tools.
翻译:研究人员最近开发了用于软件精简和配置错误检测的工具。其中一些工具基于程序由初始化阶段和主计算阶段组成的观察结果。这些工具的用户需手动标注分隔这两个阶段的边界,这一任务既耗时又易出错(通常用户需阅读并理解源代码,或通过调试器跟踪执行过程)。由于标注错误可能影响工具的准确性和功能,手动标注的要求阻碍了这些工具的大规模应用。本文对24种广泛使用的C/C++程序进行实地研究,发现其中96%的程序具有共同的边界属性。基于这些属性,我们提出自动化工具\textit{slash},能够自动定位边界。\textit{slash}在8.5分钟内成功识别了87.5%研究程序的边界,内存占用不超过4.4GB。在工具开发后的独立测试中,\textit{slash}对21个热门C/C++ GitHub仓库数据集实现了85.7%的边界识别率。最后,我们展示了\textit{slash}在简化软件精简和错误检测工具边界识别流程方面的潜力。