We study computational aspects of a key problem in robust statistics -- the penalized least trimmed squares (LTS) regression problem, a robust estimator that mitigates the influence of outliers in data by capping residuals with large magnitudes. Although statistically attractive, penalized LTS is NP-hard, and existing mixed-integer optimization (MIO) formulations scale poorly due to weak relaxations and exponential worst-case complexity in the number of observations. We propose a new MIO formulation that embeds hyperplane arrangement logic into a perspective reformulation, explicitly enforcing structural properties of optimal solutions. We show that, if the number of features is fixed, the resulting branch-and-bound tree is of polynomial size in the sample size. Moreover, we develop a tailored branch-and-bound algorithm that uses first-order methods with dual bounds to solve node relaxations efficiently. Computational experiments on synthetic and real datasets demonstrate substantial improvements over existing MIO approaches: on synthetic instances with 5000 samples and 20 features, our tailored solver reaches a 1% gap in 1 minute while competing approaches fail to do so within one hour. These gains enable exact robust regression at significantly larger sample sizes in low-dimensional settings.
翻译:我们研究了鲁棒统计中一个关键问题的计算方面——惩罚最小修剪平方(LTS)回归问题,该鲁棒估计量通过限制残差的大幅值来减轻数据中异常值的影响。尽管在统计上具有吸引力,但惩罚LTS是NP难的,且现有的混合整数优化(MIO)公式由于松弛弱且观测数量呈指数级最坏情况复杂度而扩展性差。我们提出了一种新的MIO公式,将超平面排列逻辑嵌入透视重构中,明确强制最优解的结构性质。我们证明,当特征数量固定时,所得到的分支定界树在样本量上呈多项式大小。此外,我们开发了一种定制的分支定界算法,利用基于一阶方法的对偶界高效求解节点松弛。在合成和真实数据集上的计算实验表明,与现有MIO方法相比有显著改进:在包含5000个样本和20个特征的合成实例上,我们的定制求解器在1分钟内达到1%的间隙,而竞争方法在一小时内未能实现这一目标。这些增益使得在低维设置中能够在显著更大的样本量下进行精确鲁棒回归。