Although large language models (LLMs) demonstrate expert-level medical knowledge, aligning their open-ended outputs with fine-grained clinician preferences remains challenging. Existing methods often rely on coarse objectives or unreliable automated judges that are weakly grounded in professional guidelines. We propose a two-stage framework to address this gap. First, we introduce HealthRubrics, a dataset of 7,034 physician-verified preference examples in which clinicians refine LLM-drafted rubrics to meet rigorous medical standards. Second, we distill these rubrics into HealthPrinciples: 119 broadly reusable, clinically grounded principles organized by clinical dimensions, enabling scalable supervision beyond manual annotation. We use HealthPrinciples for (1) offline alignment by synthesizing rubrics for unlabeled queries and (2) an inference-time tool for guided self-revision. A 30B parameter model that activates only 3B parameters at inference trained with our framework achieves 33.4% on HealthBench-Hard, outperforming much larger models including Deepseek-R1 and o3, establishing a resource-efficient baseline for clinical alignment.
翻译:尽管大型语言模型(LLM)展现出专家级的医学知识,但将其开放域输出与细粒度的临床医生偏好对齐仍具挑战性。现有方法通常依赖于粗粒度的目标函数或基于专业指南关联性较弱的不可靠自动评估器。为此,我们提出一个两阶段框架以解决此问题。首先,我们构建了HealthRubrics数据集,包含7,034个经医师验证的偏好示例,其中临床医生对LLM生成的评估准则进行精细化修订以满足严格的医疗标准。其次,我们将这些准则提炼为HealthPrinciples:119条按临床维度组织、具有广泛复用性的临床基础原则,从而实现了超越人工标注的可扩展监督。我们利用HealthPrinciples进行(1)离线对齐:为未标注查询合成评估准则;(2)推理时工具:实现引导式自我修订。采用本框架训练的30B参数模型(推理时仅激活3B参数)在HealthBench-Hard基准上达到33.4%的得分,性能超越包括Deepseek-R1和o3在内的更大规模模型,为临床对齐建立了资源高效的基线。