LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces

Modern Transformer-based language models achieve strong performance in natural language processing tasks, yet their latent semantic spaces remain largely uninterpretable black boxes. This paper introduces LAG-XAI (Lie Affine Geometry for Explainable AI), a novel geometric framework that models paraphrasing not as discrete word substitutions, but as a structured affine transformation within the embedding space. By conceptualizing paraphrasing as a continuous geometric flow on a semantic manifold, we propose a computationally efficient mean-field approximation, inspired by local Lie group actions. This allows us to decompose paraphrase transitions into geometrically interpretable components: rotation, deformation, and translation. Experiments on the noisy PIT-2015 Twitter corpus, encoded with Sentence-BERT, reveal a "linear transparency" phenomenon. The proposed affine operator achieves an AUC of 0.7713. By normalizing against random chance (AUC 0.5), the model captures approximately 80% of the non-linear baseline's effective classification capacity (AUC 0.8405), offering explicit parametric interpretability in exchange for a marginal drop in absolute accuracy. The model identifies fundamental geometric invariants, including a stable matrix reconfiguration angle (~27.84°) and near-zero deformation, indicating local isometry. Cross-domain generalization is confirmed via direct cross-corpus validation on an independent TURL dataset. Furthermore, the practical utility of LAG-XAI is demonstrated in LLM hallucination detection: using a "cheap geometric check," the model automatically detected 95.3% of factual distortions on the HaluEval dataset by registering deviations beyond the permissible semantic corridor. This approach provides a mathematically grounded, resource-efficient path toward the mechanistic interpretability of Transformers.

翻译：基于Transformer的现代语言模型在自然语言处理任务中取得了优异性能，但其潜在语义空间仍是难以解释的黑箱。本文提出LAG-XAI（可解释人工智能的李仿射几何方法）这一新颖的几何框架，将释义生成建模为嵌入空间中的结构化仿射变换，而非离散的词替换操作。通过将释义概念化为语义流形上的连续几何流，我们受局部李群作用启发提出一种计算高效的均值场近似方法，从而将释义跃迁分解为旋转、形变和平移等几何可解释分量。在经Sentence-BERT编码的含噪PIT-2015推特语料库上进行的实验揭示了"线性透明性"现象。所提出的仿射算子AUC达0.7713。通过以随机基准（AUC 0.5）进行归一化，该模型捕捉了非线性基线（AUC 0.8405）约80%的有效分类能力，以微小绝对精度损失换取了显式参数可解释性。模型识别出包括稳定矩阵重构角（约27.84°）和近零形变在内的基本几何不变量，表明局部等距性。通过独立TURL数据集的直接跨语料库验证，跨领域泛化能力得到确认。此外，LAG-XAI的实用价值在大语言模型幻觉检测中得到验证：采用"廉价几何检测"方法，通过对超出可接受语义走廊的偏差进行登记，该模型在HaluEval数据集上自动检测出95.3%的事实扭曲。本方法为Transformer的机制可解释性提供了数学严谨且资源高效的实现路径。