Cluster Editing, also known as Correlation Clustering, is a well-studied graph modification problem. In this problem, one is given a graph and the task is to perform up to $k$ edge additions or deletions to transform it into a cluster graph, i.e., a graph consisting of a disjoint union of cliques. However, in real-world networks, clusters are often overlapping. For example in social networks, a person might belong to several communities - e.g. those corresponding to work, school, or neighborhood. Other strong motivations come from biological network analysis and from language networks. Trying to cluster words with similar usage in the latter can be confounded by homonyms, that is, words with multiple meanings like "bat." In this paper, we introduce a new variant of Cluster Editing whereby a vertex can be split into two or more vertices. First used in the context of graph drawing, this operation allows a vertex $v$ to be replaced by two vertices whose combined neighborhood is the neighborhood of $v$ (and thus $v$ can belong to more than one cluster). We call the new problem Cluster Editing with Vertex Splitting and we initiate the study of it. We show that it is NP-complete and fixed-parameter tractable when parameterized by the total number $k$ of allowed vertex-splitting and edge-editing operations. In particular, we obtain an $O(2^{9k log k} + n + m)$-time algorithm and a $6k$-vertex kernel.
翻译:聚类编辑(亦称相关聚类)是一个研究充分的图修改问题。在该问题中,给定一个图,任务是通过最多 $k$ 次边的添加或删除操作将其转换为一个聚类图,即由不相交团并构成的图。然而,现实世界网络中的聚类往往是重叠的。例如在社交网络中,一个人可能属于多个社群——如工作、学校或邻里社区。其他重要应用来自生物网络分析和语言网络。在后者的应用中,对具有相似用法的词语进行聚类可能因同音词(如"bat"这类具有多种含义的词)而受到干扰。本文引入了一种新的聚类编辑变体,其中允许将一个顶点分裂为两个或多个顶点。该操作最初用于图绘制领域,允许将顶点 $v$ 替换为两个顶点,其合并邻域等于 $v$ 的邻域(从而 $v$ 可属于多个聚类)。我们称新问题为"基于顶点分裂的聚类编辑"并启动其研究。我们证明该问题是NP完全的,且当以允许的总操作次数 $k$(包括顶点分裂和边编辑操作)为参数时是固定参数可解的。特别地,我们得到了一个 $O(2^{9k log k} + n + m)$ 时间复杂度的算法和一个 $6k$ 顶点核。