Deep neural networks have achieved remarkable performance in various text-based tasks but often lack interpretability, making them less suitable for applications where transparency is critical. To address this, we propose ProtoLens, a novel prototype-based model that provides fine-grained, sub-sentence level interpretability for text classification. ProtoLens uses a Prototype-aware Span Extraction module to identify relevant text spans associated with learned prototypes and a Prototype Alignment mechanism to ensure prototypes are semantically meaningful throughout training. By aligning the prototype embeddings with human-understandable examples, ProtoLens provides interpretable predictions while maintaining competitive accuracy. Extensive experiments demonstrate that ProtoLens outperforms both prototype-based and non-interpretable baselines on multiple text classification benchmarks. Code and data are available at \url{https://anonymous.4open.science/r/ProtoLens-CE0B/}.
翻译:深度神经网络在各种基于文本的任务中取得了显著性能,但往往缺乏可解释性,使其在透明度至关重要的应用场景中适用性受限。为解决这一问题,我们提出ProtoLens——一种基于原型的新型模型,可为文本分类提供细粒度的子句级可解释性。ProtoLens采用原型感知的文本片段提取模块来识别与学习到的原型相关的文本片段,并通过原型对齐机制确保原型在整个训练过程中保持语义意义。通过将原型嵌入与人类可理解的示例对齐,ProtoLens在保持竞争力的准确率的同时提供了可解释的预测结果。大量实验表明,ProtoLens在多个文本分类基准测试中均优于基于原型的模型与不可解释的基线模型。代码与数据可通过 \url{https://anonymous.4open.science/r/ProtoLens-CE0B/} 获取。