Large language models (LLMs) demonstrate strong general reasoning and language understanding, yet their performance degrades in domains governed by strict formal rules, precise terminology, and legally binding structure. Tax law exemplifies these challenges, as correct answers require exact statutory citation, structured legal argumentation, and numerical accuracy under rigid grading schemes. We algorithmically generate SteuerEx, the first open benchmark derived from authentic German university tax law examinations. SteuerEx comprises 115 expert-validated examination questions spanning six core tax law domains and multiple academic levels, and employs a statement-level, partial-credit evaluation framework that closely mirrors real examination practice. We further present SteuerLLM, a domain-adapted LLM for German tax law trained on a large-scale synthetic dataset generated from authentic examination material using a controlled retrieval-augmented pipeline. SteuerLLM (28B parameters) consistently outperforms general-purpose instruction-tuned models of comparable size and, in several cases, substantially larger systems, demonstrating that domain-specific data and architectural adaptation are more decisive than parameter scale for performance on realistic legal reasoning tasks. All benchmark data, training datasets, model weights, and evaluation code are released openly to support reproducible research in domain-specific legal artificial intelligence. A web-based demo of SteuerLLM is available at https://steuerllm.i5.ai.fau.de.
翻译:大语言模型(LLMs)展现出强大的通用推理与语言理解能力,但在受严格形式规则、精确术语及法律约束性结构支配的领域中,其性能会出现下降。税法领域集中体现了这些挑战,因为正确答案要求精确的法条引用、结构化的法律论证以及在严格评分标准下的数值准确性。我们通过算法生成SteuerEx——首个基于真实德国大学税法考试构建的开放基准。SteuerEx包含115道经专家验证的考试题目,涵盖六个核心税法领域及多个学术层级,并采用语句级部分评分评估框架,高度模拟真实考试实践。我们进一步提出SteuerLLM,这是一个针对德国税法领域进行适配的大语言模型,其训练数据源自通过受控检索增强流程从真实考试材料生成的大规模合成数据集。SteuerLLM(280亿参数)在多项任务中持续超越同等规模的通用指令微调模型,并在若干案例中显著优于参数规模更大的系统,这表明对于现实法律推理任务而言,领域特定数据与架构适配比参数规模更具决定性。所有基准数据、训练数据集、模型权重及评估代码均已公开发布,以支持领域特定法律人工智能的可复现研究。SteuerLLM的网页演示版可通过https://steuerllm.i5.ai.fau.de访问。