Multi-label classification is a common challenge in various machine learning applications, where a single data instance can be associated with multiple classes simultaneously. The current paper proposes a novel tree-based method for multi-label classification using conformal prediction and multiple hypothesis testing. The proposed method employs hierarchical clustering with labelsets to develop a hierarchical tree, which is then formulated as a multiple-testing problem with a hierarchical structure. The split-conformal prediction method is used to obtain marginal conformal $p$-values for each tested hypothesis, and two \textit{hierarchical testing procedures} are developed based on marginal conformal $p$-values, including a hierarchical Bonferroni procedure and its modification for controlling the family-wise error rate. The prediction sets are thus formed based on the testing outcomes of these two procedures. We establish a theoretical guarantee of valid coverage for the prediction sets through proven family-wise error rate control of those two procedures. We demonstrate the effectiveness of our method in a simulation study and two real data analysis compared to other conformal methods for multi-label classification.
翻译:摘要:多标签分类是多种机器学习应用中的常见挑战,其中单个数据实例可以同时与多个类别相关联。本文提出了一种新颖的基于树的多标签分类方法,结合了保形预测与多重假设检验。该方法利用标签集进行层次聚类以构建层次树,并将其表述为具有层次结构的多重检验问题。采用分裂保形预测方法获取每个检验假设的边缘保形 $p$ 值,并基于这些边缘保形 $p$ 值开发了两种\textit{层次检验程序},包括用于控制族系错误率的层次Bonferroni程序及其改进版本。预测集由此基于这两种程序的检验结果形成。我们通过证明这两种程序对族系错误率的控制,为预测集的有效覆盖提供了理论保障。通过仿真研究和两项实际数据分析,我们展示了所提方法相较于其他用于多标签分类的保形方法的有效性。