We consider the following problem in computational geometry: given, in the d-dimensional real space, a set of points marked as positive and a set of points marked as negative, such that the convex hull of the positive set does not intersect the negative set, find K hyperplanes that separate, if possible, all the positive points from the negative ones. That is, we search for a convex polyhedron with at most K faces, containing all the positive points and no negative point. The problem is known in the literature for pure convex polyhedral approximation; our interest stems from its possible applications in constraint learning, where points are feasible or infeasible solutions of a Mixed Integer Program, and the K hyperplanes are linear constraints to be found. We cast the problem as an optimization one, minimizing the number of negative points inside the convex polyhedron, whenever exact separation cannot be achieved. We introduce models inspired by support vector machines and we design two mathematical programming formulations with binary variables. We exploit Dantzig-Wolfe decomposition to obtain extended formulations, and we devise column generation algorithms with ad-hoc pricing routines. We compare computing time and separation error values obtained by all our approaches on synthetic datasets, with number of points from hundreds up to a few thousands, showing our approaches to perform better than existing ones from the literature. Furthermore, we observe that key computational differences arise, depending on whether the budget K is sufficient to completely separate the positive points from the negative ones or not. On 8-dimensional instances (and over), existing convex hull algorithms become computational inapplicable, while our algorithms allow to identify good convex hull approximations in minutes of computation.
翻译:我们考虑计算几何中的以下问题:在d维实空间中,给定一组标记为正的点集和一组标记为负的点集,且正点集的凸包与负点集不相交,求K个超平面(若可能)将所有正点与负点完全分离。换言之,我们需寻找一个最多包含K个面的凸多面体,使其包含所有正点且不包含任何负点。该问题在文献中被称为纯凸多面体近似问题;我们的研究兴趣源于其在约束学习中的潜在应用,其中点代表混合整数规划的可行解或不可行解,而K个超平面即待求的线性约束。我们将该问题转化为优化问题:当无法实现精确分离时,以最小化凸多面体内负点数量为目标。受支持向量机启发,我们提出新型模型,并设计两种含二元变量的数学规划形式。通过利用Dantzig-Wolfe分解获得扩展形式,进而设计配备定制化定价例程的列生成算法。我们在合成数据集上对比所有方法的计算时间与分离误差值(数据点规模从数百到数千),结果表明我们的方法优于现有文献方案。此外,我们发现计算性能的关键差异取决于预算K是否足以完全分离正负点集。在8维及以上实例中,现有凸包算法已无法实际计算,而我们的算法能在数分钟内获得优质的凸包近似解。