Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications. However, applying explainability and human-in-the-loop methods requires technical proficiency. Despite existing toolkits for model understanding and analysis, options to integrate human feedback are still limited. We propose IFAN, a framework for real-time explanation-based interaction with NLP models. Through IFAN's interface, users can provide feedback to selected model explanations, which is then integrated through adapter layers to align the model with human rationale. We show the system to be effective in debiasing a hate speech classifier with minimal impact on performance. IFAN also offers a visual admin system and API to manage models (and datasets) as well as control access rights. A demo is live at https://ifan.ml.
翻译:可解释性与人工监督是将复杂NLP模型部署到实际应用中的基础支柱。然而,应用可解释性方法和人在回路技术需要具备相应的专业技能。尽管现有多个面向模型理解与分析的工具包,但整合人类反馈的途径仍然有限。本文提出IFAN框架,用于实现基于实时解释的NLP模型交互。通过IFAN界面,用户可针对所选模型解释提供反馈,该反馈随后通过适配器层进行整合,从而使模型与人类推理依据保持一致。实验表明,该系统能在最小化性能影响的前提下有效消除仇恨言论分类器的偏差。IFAN还提供可视化管理系统和API,支持模型(及数据集)管理与访问权限控制。演示系统已上线:https://ifan.ml。