Ensuring factuality is essential for the safe use of Large Language Models (LLMs) in high-stakes domains such as medicine and law. Conformal inference provides distribution-free guarantees, but existing approaches are either overly conservative, discarding many true-claims, or rely on adaptive error rates and simple linear models that fail to capture complex group structures. To address these challenges, we reformulate conformal inference in a multiplicative filtering setting, modeling factuality as a product of claim-level scores. Our method, Multi-LLM Adaptive Conformal Inference (MACI), leverages ensembles to produce more accurate factuality-scores, which in our experiments led to higher retention, while validity is preserved through group-conditional calibration. Experiments show that MACI consistently achieves user-specified coverage with substantially higher retention and lower time cost than baselines. Our repository is available at https://github.com/MLAI-Yonsei/MACI
翻译:确保事实性对于大型语言模型(LLM)在医学和法律等高风险领域的安全使用至关重要。共形推理提供无分布保证,但现有方法要么过于保守而丢弃大量真实声明,要么依赖自适应错误率和无法捕捉复杂群体结构的简单线性模型。为解决这些挑战,我们在乘法过滤框架中重新形式化共形推理,将事实性建模为声明级分数的乘积。我们的方法——多LLM自适应共形推理(MACI)——利用集成模型生成更准确的事实性分数,实验表明这能实现更高的保留率,同时通过群体条件校准保持有效性。实验证明,与基线方法相比,MACI始终以显著更高的保留率和更低的时间成本达到用户指定的覆盖范围。我们的代码库发布于 https://github.com/MLAI-Yonsei/MACI