Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications. However, applying explainability and human-in-the-loop methods requires technical proficiency. Despite existing toolkits for model understanding and analysis, options to integrate human feedback are still limited. We propose IFAN, a framework for real-time explanation-based interaction with NLP models. Through IFAN's interface, users can provide feedback to selected model explanations, which is then integrated through adapter layers to align the model with human rationale. We show the system to be effective in debiasing a hate speech classifier with minimal performance loss. IFAN also offers a visual admin system and API to manage models (and datasets) as well as control access rights. A demo is live at https://ifan.ml/
翻译:可解释性和人工监督是将复杂NLP模型部署到实际应用中的基本支柱。然而,应用可解释性和人在回路方法需要技术熟练度。尽管已有用于模型理解和分析的工具包,但整合人类反馈的选项仍然有限。我们提出了IFAN,一个用于与NLP模型进行实时基于解释交互的框架。通过IFAN的界面,用户可以针对选定的模型解释提供反馈,该反馈随后通过适配器层被整合,以将模型与人类推理逻辑对齐。实验表明,该系统在最小化性能损失的前提下,能有效消除仇恨言论分类器的偏差。IFAN还提供了一个可视化管理系统和API,用于管理模型(及数据集)以及控制访问权限。演示地址为 https://ifan.ml/