Clustering Permutations under the Ulam Metric: A Parameterized Complexity Study

Rank aggregation seeks a representative permutation for a collection of rankings and plays a central role in areas such as social choice, information retrieval, and computational biology. Two fundamental aggregation tasks are the center and median problems, which minimize the maximum and the total distance to the input permutations, respectively. While these problems are well understood under Kendall's tau and related distances, their parameterized complexity under the Ulam metric, an edit-distance-based metric on permutations, has remained largely unexplored. In this work, we initiate a systematic study of the parameterized complexity of rank aggregation under the Ulam metric. We consider both the center and median problems, as well as their generalizations to the $k$-center and $k$-median clustering settings, parameterized by the number of centers $k$ and the distance budget $d$ (corresponding to the maximum distance for center variants and the total distance for median variants). Both problems are known to be NP-hard already for $k=1$. We show that the Ulam $k$-center problem remains NP-hard when $d=1$, but is fixed-parameter tractable when parameterized by $k + d$. Our algorithm is based on a novel local-search framework tailored to the non-local nature of Ulam distances. We complement this by proving that no polynomial kernel exists for the $k+d$ parameterization unless NP $\subseteq$ coNP/poly. For the Ulam $k$-median problem parameterized by the total distance $d$, we establish W[1]-hardness and provide an XP algorithm. We also provide a polynomial kernel for the parameter $k + d$, which in turn yields a fixed-parameter tractable algorithm.

翻译：排名聚合旨在从一组排序中寻找代表性排列，在社会选择、信息检索和计算生物学等领域具有核心作用。两个基本聚合任务为中心问题和中位数问题，分别最小化与输入排列的最大距离和总距离。尽管这些问题在Kendall's tau及相关距离度量下已被充分理解，但在基于编辑距离的排列度量——乌拉姆度量下，其参数化复杂度仍鲜有探索。本文首次系统研究乌拉姆度量下排名聚合的参数化复杂度。我们同时考虑中心问题和中位数问题，及其推广至$k$-中心与$k$-中位数聚类情形，并以中心数$k$和距离预算$d$（对应中心变体的最大距离和中位数变体的总距离）作为参数化。已知两个问题在$k=1$时已是NP难问题。我们证明：当$d=1$时，乌拉姆$k$-中心问题仍为NP难问题，但以$k+d$为参数时具有固定参数可解性。该算法基于针对乌拉姆距离非局部性设计的局部搜索新框架。我们进一步证明：除非NP $\subseteq$ coNP/poly，否则$k+d$参数化不存在多项式核。对于以总距离$d$为参数的乌拉姆$k$-中位数问题，我们确立了W[1]难度并给出XP算法。同时，我们为参数$k+d$提供了多项式核，进而得到固定参数可解算法。