Many real-world multi-label prediction problems involve set-valued predictions that must satisfy specific requirements dictated by downstream usage. We focus on a typical scenario where such requirements, separately encoding $\textit{value}$ and $\textit{cost}$, compete with each other. For instance, a hospital might expect a smart diagnosis system to capture as many severe, often co-morbid, diseases as possible (the value), while maintaining strict control over incorrect predictions (the cost). We present a general pipeline, dubbed as FavMac, to maximize the value while controlling the cost in such scenarios. FavMac can be combined with almost any multi-label classifier, affording distribution-free theoretical guarantees on cost control. Moreover, unlike prior works, it can handle real-world large-scale applications via a carefully designed online update mechanism, which is of independent interest. Our methodological and theoretical contributions are supported by experiments on several healthcare tasks and synthetic datasets - FavMac furnishes higher value compared with several variants and baselines while maintaining strict cost control. Our code is available at https://github.com/zlin7/FavMac
翻译:许多现实世界的多标签预测问题涉及集合值预测,这些预测必须满足下游使用所规定的特定要求。我们聚焦于一个典型场景:分别编码$\textit{价值}$和$\textit{成本}$的此类要求相互竞争。例如,医院可能期望一个智能诊断系统能够捕获尽可能多的严重且常共存的疾病(即价值),同时严格限制错误预测(即成本)。我们提出一个通用流程,命名为FavMac,旨在在此类场景中最大化价值的同时控制成本。FavMac几乎可与任何多标签分类器结合,为成本控制提供无分布假设的理论保证。此外,与先前工作不同,它通过精心设计的在线更新机制,能够处理现实世界的大规模应用,这一机制本身也具有独立研究价值。我们的方法性和理论性贡献得到了多个医疗任务和合成数据集实验的支持——FavMac在保持严格成本控制的同时,相比若干变体与基线方法提供了更高的价值。我们的代码已开源,地址为 https://github.com/zlin7/FavMac