In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize the optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We demonstrate the effectiveness of our approach over conventional approaches in simulations and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.
翻译:本文提出了一种新颖的概念框架,利用凹成本函数的最优传输进行异常值检测。传统的异常值检测方法通常采用两阶段流程:首先检测并移除异常值,然后在清洗后的数据上进行估计。然而,这种方法未能将异常值移除与估计任务相结合,存在改进空间。为克服这一局限,我们提出了一种自动异常值校正机制,将校正与估计整合在一个联合优化框架内。我们首次利用凹成本函数的最优传输距离,在概率分布空间中构建校正集合。随后,在校正集合中选择最优分布以执行估计任务。值得注意的是,本文引入的凹成本函数是使估计器在优化过程中有效识别异常值的关键。通过均值估计、最小绝对回归以及期权隐含波动率曲面拟合的模拟与实证分析,我们证明了该方法相较于传统方法的优越性。