WhyFlow：面向溯源分析理解过程的交互式调试工具 (WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis)

Taint analysis is a security analysis technique used to track the flow of potentially dangerous data through an application and its dependent libraries. Investigating why certain unexpected flows appear and why expected flows are missing is an important sensemaking process during end-user taint analysis. Existing taint analysis tools often do not provide this end-user debugging capability, where developers can ask why, why-not, and what-if questions about dataflows and reason about the impact of configuring sources and sinks, and models of 3rd-party libraries that abstract permissible and impermissible data flows. Furthermore, a tree-view or a list-view used in existing taint-analyzer's visualization makes it difficult to reason about the global impact on connectivity between multiple sources and sinks. Inspired by the insight that sensemaking tool-generated results can be significantly improved by a QA inquiry process, we propose TraceLens, a first end-user question-answer style debugging interface for taint analysis. It enables a user to ask why, why-not, and what-if questions to investigate the existence of suspicious flows, the non-existence of expected flows, and the global impact of third-party library models. TraceLens performs speculative what-if analysis, to help a user in debugging how different connectivity assumptions affect overall results. A user study with 12 participants shows that participants using TraceLens achieved 21% higher accuracy on average, compared to CodeQL. They also reported a 45% reduction in mental demand (NASA-TLX) and rated higher confidence in identifying relevant flows using TraceLens.

翻译：溯源分析是一种安全分析技术，用于追踪应用程序及其依赖库中潜在危险数据的流动路径。在终端用户溯源分析过程中，探究特定意外数据流出现的原因以及预期数据流缺失的原因，是理解分析结果的重要环节。现有溯源分析工具通常不提供此类终端用户调试功能，使开发者无法就数据流提出"为何存在"、"为何缺失"及"假设性"等质询，亦难以推演配置源节点与汇节点的影响，以及第三方库模型中抽象化许可与禁止数据流的作用机制。此外，现有溯源分析工具可视化界面采用的树状或列表视图，难以呈现多源节点与多汇节点间连通性的全局影响。受"问答式探究过程可显著提升理解型工具生成结果"的启示，我们提出了TraceLens——首个面向溯源分析的终端用户问答式调试界面。该系统支持用户通过提出"为何存在"、"为何缺失"及"假设性"三类问题，分别探查可疑数据流的成因、预期数据流缺失的缘由以及第三方库模型的全局影响。TraceLens通过执行推测性假设分析，协助用户调试不同连通性假设对整体结果的影响机制。一项包含12名参与者的用户研究表明，与CodeQL相比，使用TraceLens的参与者在任务准确率上平均提升21%。参与者同时报告心智负荷（NASA-TLX量表）降低45%，并对使用TraceLens识别相关数据流表现出更高置信度。