Current evaluation paradigms for Large Language Model (LLM) personalization rely heavily on brittle surface-matching metrics or computationally expensive LLM-as-a-judge protocols, both of which lack interpretability. To address these limitations, we introduce Natural Language Inference Constraint Verification (NLICV), a scalable, semantically invariant framework that maps sentence meanings to truth-condition sets to verify personalization constraints via a Natural Language Inference (NLI) model. Moving beyond binary scoring, NLICV categorizes LLM behaviors into four distinct modes: personalization, generalization, sycophancy, and failure. Extensive experiments demonstrate that NLICV aligns closely with human annotations while drastically reducing the latency and token costs associated with LLM judges (up to 2100 inference speedup). Finally, through an ablation-based procedure, NLICV pinpoints the exact sentences driving the constraint verification, yielding faithful, understandable evidence for its evaluations.
翻译:当前针对大语言模型(LLM)个性化的评估范式高度依赖脆弱的表面匹配指标或计算代价高昂的“LLM作为裁判”(LLM-as-a-judge)协议,两者均缺乏可解释性。为克服这些局限,我们提出自然语言推断约束验证(NLICV)——一种可扩展且语义不变的框架,通过自然语言推断(NLI)模型将句子含义映射至真值条件集,从而验证个性化约束。NLICV超越了二元评分,将LLM行为划分为四种不同模式:个性化、泛化、谄媚及失败。大量实验表明,NLICV与人工标注高度一致,同时大幅降低LLM裁判涉及的延迟与令牌成本(推理速度提升达2100倍)。最后,借助基于消融的流程,NLICV能够精准定位驱动约束验证的句子,为其评估结果提供忠实且可理解的证据。