The top-$k$-sum operator computes the sum of the largest $k$ components of a given vector. The Euclidean projection onto the top-$k$-sum constraint serves as a crucial subroutine in iterative methods to solve composite superquantile optimization problems. In this paper, we introduce a solver that implements two finite-termination algorithms to compute this projection. Both algorithms have complexity $O(n)$ when applied to a sorted $n$-dimensional input vector, where the absorbed constant is independent of $k$. This stands in contrast to the existing grid-search-inspired method that has $O(k(n-k))$ complexity. The improvement is significant when $k$ is linearly dependent on $n$, which frequently encountered in practical superquantile optimization applications. In instances where the input vector is unsorted, an additional cost is incurred to (partially) sort the vector. To reduce this cost, we further derive a rigorous procedure that leverages approximate sorting to compute the projection, which is particularly useful when solving a sequence of similar projection problems. Numerical results show that our methods solve problems of scale $n=10^7$ and $k=10^4$ within $0.05$ seconds, whereas the existing grid-search-based method and the Gurobi QP solver can take minutes to hours.
翻译:顶-$k$-和算子计算给定向量中最大的$k$个分量的和。欧几里得投影到顶-$k$-和约束是迭代方法中求解复合超分位数优化问题的关键子程序。本文介绍了一个求解器,该求解器实现了两种有限终止算法来计算此投影。当应用于已排序的$n$维输入向量时,两种算法的复杂度均为$O(n)$,其中吸收常数与$k$无关。这与现有网格搜索启发法的$O(k(n-k))$复杂度形成对比。当$k$与$n$线性相关时(这在实际超分位数优化应用中经常遇到),改进效果显著。在输入向量未排序的情况下,需要额外成本进行(部分)排序。为降低此成本,我们进一步推导出一种利用近似排序计算投影的严格方法,这在解决一系列相似投影问题时特别有用。数值结果表明,我们的方法能在0.05秒内解决规模$n=10^7$和$k=10^4$的问题,而现有基于网格搜索的方法和Gurobi QP求解器可能需要数分钟到数小时。