Conformal prediction (CP) offers distribution-free uncertainty quantification for machine learning models, yet its interplay with fairness in downstream decision-making remains underexplored. Moving beyond CP as a standalone operation (procedural fairness), we analyze the holistic decision-making pipeline to evaluate substantive fairness-the equity of downstream outcomes. Theoretically, we derive an upper bound that decomposes prediction-set size disparity into interpretable components, clarifying how label-clustered CP helps control method-driven contributions to unfairness. To facilitate scalable empirical analysis, we introduce an LLM-in-the-loop evaluator that approximates human assessment of substantive fairness across diverse modalities. Our experiments reveal that label-clustered CP variants consistently deliver superior substantive fairness. Finally, we empirically show that equalized set sizes, rather than coverage, strongly correlate with improved substantive fairness, enabling practitioners to design more fair CP systems. Our code is available at https://github.com/layer6ai-labs/llm-in-the-loop-conformal-fairness.
翻译:共形预测(CP)为机器学习模型提供了无需分布假设的不确定性量化,但其在下游决策制定中与公平性的相互作用仍未得到充分探索。超越将CP视为独立操作(过程公平性)的视角,我们分析整体决策流程以评估实质公平性——即下游结果的公平性。理论上,我们推导了一个将预测集大小差异分解为可解释分量的上界,阐明了标签聚类CP如何帮助控制由方法驱动的不公平性贡献。为促进可扩展的实证分析,我们引入了一个LLM参与循环的评估器,该评估器可近似人类对不同模态下实质公平性的评估。我们的实验表明,标签聚类CP变体始终能提供更优的实质公平性。最后,我们通过实证证明,均衡的集合大小(而非覆盖率)与改善的实质公平性高度相关,这使实践者能够设计更公平的CP系统。我们的代码可在 https://github.com/layer6ai-labs/llm-in-the-loop-conformal-fairness 获取。