Profiling German Text Simplification with Interpretable Model-Fingerprints

While Large Language Models (LLMs) produce highly nuanced text simplifications, developers currently lack tools for a holistic, efficient, and reproducible diagnosis of their behavior. This paper introduces the Simplification Profiler, a diagnostic toolkit that generates a multidimensional, interpretable fingerprint of simplified texts. Multiple aggregated simplifications of a model result in a model's fingerprint. This novel evaluation paradigm is particularly vital for languages, where the data scarcity problem is magnified when creating flexible models for diverse target groups rather than a single, fixed simplification style. We propose that measuring a model's unique behavioral signature is more relevant in this context as an alternative to correlating metrics with human preferences. We operationalize this with a practical meta-evaluation of our fingerprints' descriptive power, which bypasses the need for large, human-rated datasets. This test measures if a simple linear classifier can reliably identify various model configurations by their created simplifications, confirming that our metrics are sensitive to a model's specific characteristics. The Profiler can distinguish high-level behavioral variations between prompting strategies and fine-grained changes from prompt engineering, including few-shot examples. Our complete feature set achieves classification F1-scores up to 71.9 %, improving upon simple baselines by over 48 percentage points. The Simplification Profiler thus offers developers a granular, actionable analysis to build more effective and truly adaptive text simplification systems.

翻译：尽管大型语言模型(LLM)能够生成高度精细的文本简化结果，但开发者目前缺乏对其行为进行整体性、高效性和可复现性诊断的工具。本文提出简化特征分析器——一种能够生成多维可解释简化文本指纹的诊断工具包。模型的多重聚合简化结果构成其指纹特征。这种新颖的评估范式对于数据稀缺语言尤为重要，因为为多样化目标群体（而非单一固定简化风格）创建灵活模型时，数据稀缺问题会被放大。我们认为在此背景下，测量模型独特的行为特征比将评估指标与人类偏好相关联更具实际意义。我们通过对指纹描述力的实用元评估来实现这一目标，该方法无需依赖大规模人工标注数据集。该测试通过验证简单线性分类器能否根据生成文本可靠识别不同模型配置，证实了我们的指标对模型特定特征具有敏感性。该分析器能够区分提示策略之间的高层行为差异，以及提示工程（包括少样本示例）引发的细粒度变化。我们的完整特征集实现了高达71.9%的分类F1分数，较简单基线提升超过48个百分点。因此，简化特征分析器为开发者提供了精细化、可操作的分析方案，助力构建更高效且真正自适应的文本简化系统。