While Large Language Models (LLMs) have achieved strong performance on general-purpose language tasks, their deployment in regulated and data-sensitive domains, including insurance, remains limited. Leveraging millions of historical warranty claims, we propose a locally deployed governance-aware language modeling component that generates structured corrective-action recommendations from unstructured claim narratives. We fine-tune pretrained LLMs using Low-Rank Adaptation (LoRA), scoping the model to an initial decision module within the claim processing pipeline to speed up claim adjusters' decisions. We assess this module using a multi-dimensional evaluation framework that combines automated semantic similarity metrics with human evaluation, enabling a rigorous examination of both practical utility and predictive accuracy. Our results show that domain-specific fine-tuning substantially outperforms commercial general-purpose and prompt-based LLMs, with approximately 80% of the evaluated cases achieving near-identical matches to ground-truth corrective actions. Overall, this study provides both theoretical and empirical evidence to prove that domain-adaptive fine-tuning can align model output distributions more closely with real-world operational data, demonstrating its promise as a reliable and governable building block for insurance applications.
翻译:尽管大语言模型(LLMs)在通用语言任务上已表现出强大性能,其在保险等受监管且数据敏感领域的应用仍较为有限。本研究利用数百万条历史保修索赔数据,提出一种本地部署的、具备治理意识的语言建模组件,能够从非结构化的理赔描述中生成结构化的纠正措施建议。我们采用低秩自适应(LoRA)方法对预训练大语言模型进行微调,将其限定为理赔处理流程中的初始决策模块,以加速理赔审核员的决策过程。我们通过一个多维评估框架对该模块进行评估,该框架结合了自动语义相似度指标与人工评估,从而对其实用性和预测准确性进行严格检验。实验结果表明,领域特异性微调显著优于商用通用大语言模型及基于提示的LLMs,约80%的评估案例实现了与真实纠正措施近乎一致的匹配。总体而言,本研究从理论与实证两方面证明:领域自适应微调能使模型输出分布更贴近实际业务数据,彰显其作为保险应用中可靠且可治理构建模块的潜力。