Explainable AI (XAI) methods reveal which features influence model predictions, yet provide limited means for practitioners to act on these explanations. Activation steering of components identified via XAI offers a path toward actionable explanations, although its practical utility remains understudied. We introduce an interactive workflow combining SAE-based attribution with activation steering for instance-level analysis of concept usage in vision models, implemented as a web-based tool. Based on this workflow, we conduct semi-structured expert interviews (N=8) with debugging tasks on CLIP to investigate how practitioners reason about, trust, and apply activation steering. We find that steering enables a shift from inspection to intervention-based hypothesis testing (8/8 participants), with most grounding trust in observed model responses rather than explanation plausibility alone (6/8). Participants adopted systematic debugging strategies dominated by component suppression (7/8) and highlighted risks including ripple effects and limited generalization of instance-level corrections. Overall, activation steering renders interpretability more actionable while raising important considerations for safe and effective use.
翻译:可解释人工智能(XAI)方法虽能揭示影响模型预测的特征,但实践者据此采取行动的手段十分有限。针对XAI识别出的模型组件进行激活引导,为构建可操作的解释提供了路径,但其实际效用仍待深入研究。我们提出一种结合基于稀疏自编码器(SAE)的归因与激活引导的交互式工作流程,用于视觉模型中概念使用模式的实例级分析,并实现为网页工具。基于该流程,我们以CLIP模型调试任务为场景,开展半结构化专家访谈(N=8),探究实践者如何推理、信任及应用激活引导。研究发现:激活引导促使参与者从"观察式分析"转向"基于干预的假设验证"(8/8),多数参与者(6/8)将信任建立在模型响应观察而非解释表面合理性上。参与者采用以组件抑制为主的系统调试策略(7/8),同时指出涟漪效应与实例级校正泛化能力有限等风险。总体而言,激活引导在提升可解释性可操作性的同时,也引出了安全有效使用的重要考量。