Most existing classification methods aim to minimize the overall misclassification error rate. However, in applications such as loan default prediction, different types of errors can have varying consequences. To address this asymmetry issue, two popular paradigms have been developed: the Neyman-Pearson (NP) paradigm and the cost-sensitive (CS) paradigm. Previous studies on the NP paradigm have primarily focused on the binary case, while the multi-class NP problem poses a greater challenge due to its unknown feasibility. In this work, we tackle the multi-class NP problem by establishing a connection with the CS problem via strong duality and propose two algorithms. We extend the concept of NP oracle inequalities, crucial in binary classifications, to NP oracle properties in the multi-class context. Our algorithms satisfy these NP oracle properties under certain conditions. Furthermore, we develop practical algorithms to assess the feasibility and strong duality in multi-class NP problems, which can offer practitioners the landscape of a multi-class NP problem with various target error levels. Simulations and real data studies validate the effectiveness of our algorithms. To our knowledge, this is the first study to address the multi-class NP problem with theoretical guarantees. The proposed algorithms have been implemented in the R package \texttt{npcs}, which is available on CRAN.
翻译:大多数现有的分类方法旨在最小化整体误分类错误率。然而,在诸如贷款违约预测等应用中,不同类型的错误可能产生不同的后果。为解决这种不对称性问题,学界发展出两种主流范式:Neyman-Pearson(NP)范式和代价敏感(CS)范式。先前关于NP范式的研究主要集中于二分类情形,而多分类NP问题因其可行性未知而面临更大挑战。本研究通过强对偶性建立NP问题与CS问题之间的关联,进而提出两种算法以解决多分类NP问题。我们将二分类中至关重要的NP oracle不等式概念,拓展至多分类背景下的NP oracle性质。在特定条件下,我们的算法满足这些NP oracle性质。此外,我们开发了实用算法来评估多分类NP问题的可行性与强对偶性,这些算法能够为实践者展示具有不同目标错误水平的多分类NP问题全景。仿真实验与真实数据研究验证了我们算法的有效性。据我们所知,这是首个具有理论保证的多分类NP问题研究。所提出的算法已实现在R包\texttt{npcs}中,该包可通过CRAN获取。