In many applications, when building linear regression models, it is important to account for the presence of outliers, i.e., corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic terms, each given by the product of a binary variable and a quadratic term of the continuous variables. Existing approaches in the literature, typically relying on the linearization of the cubic terms using big-M constraints, suffer from weak relaxation and poor performance in practice. In this work we derive stronger second-order conic relaxations that do not involve big-M constraints. Our computational experiments indicate that the proposed formulations are several orders-of-magnitude faster than existing big-M formulations in the literature for this problem.
翻译:在许多应用中,建立线性回归模型时必须考虑离群点(即被污染的输入数据点)的存在。此类问题可表述为包含三次项的混合整数优化问题,其中每个三次项由二元变量与连续变量二次项的乘积构成。现有文献中的方法通常依赖使用大M约束对三次项进行线性化,但存在松弛较弱且实际性能欠佳的缺陷。本文推导了不涉及大M约束的强二阶锥松弛方法。计算实验表明,针对该问题,本文提出的方法在运算速度上较现有大M方法快数个数量级。