In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize an optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We discuss the fundamental differences between our estimator and optimal transport-based distributionally robust optimization estimator. finally, we demonstrate the effectiveness and superiority of our approach over conventional approaches in extensive simulation and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.
翻译:本文提出了一种利用凹代价函数的最优传输检测异常值的新型概念框架。传统异常值检测方法通常采用两阶段流程:首先检测并移除异常值,然后在清洗后的数据上进行估计。然而,这种方法未将异常值移除与估计任务相关联,存在改进空间。为解决这一局限性,我们提出了一种自动异常值修正机制,该机制在联合优化框架中整合了修正与估计过程。我们率先采用具有凹代价函数的最优传输距离,在概率分布空间中构建修正集,进而从修正集中选取最优分布执行估计任务。值得注意的是,本文引入的凹代价函数是使我们的估计器在优化过程中有效识别异常值的关键。我们探讨了该估计器与基于最优传输的分布鲁棒优化估计器之间的根本差异。最后,通过均值估计、最小绝对回归及期权隐含波动率曲面拟合的大量模拟与实证分析,证明了该方法较传统方法的有效性与优越性。