We consider the problem of counting the copies of a length-$k$ pattern $\sigma$ in a sequence $f \colon [n] \to \mathbb{R}$, where a copy is a subset of indices $i_1 < \ldots < i_k \in [n]$ such that $f(i_j) < f(i_\ell)$ if and only if $\sigma(j) < \sigma(\ell)$. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when $k$ is a small fixed constant. Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] demonstrated that the detection variant is solvable in $O(n)$ time for any fixed $k$. Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of $n^{\Omega(k / \log k)}$ [Berendsohn, Kozma, and Marx 2019] and is expected to be polynomially harder than detection as early as $k = 4$, given its equivalence to counting $4$-cycles in graphs [Dudek and Gawrychowski, 2020]. In this work, we design a deterministic near-linear time $(1+\varepsilon)$-approximation algorithm for counting $\sigma$-copies in $f$ for all $k \leq 5$. Combined with the conditional lower bound for $k=4$, this establishes the first known separation between approximate and exact algorithms for pattern counting. Interestingly, our algorithm leverages the Birg\'e decomposition -- a sublinear tool for monotone distributions widely used in distribution testing -- which, to our knowledge, has not been applied in a pattern counting context before.
翻译:我们研究在序列$f \colon [n] \to \mathbb{R}$中统计长度为$k$的模式$\sigma$副本数量的问题,其中副本定义为满足以下条件的下标子集$i_1 < \ldots < i_k \in [n]$:当且仅当$\sigma(j) < \sigma(\ell)$时,有$f(i_j) < f(i_\ell)$。该问题在排序、非参数统计、组合数学及精细复杂度理论等领域具有广泛联系与应用,尤其当$k$为固定小常数时。近年来在模式检测与计数方面的研究取得了显著进展。Guillemot与Marx[2014]证明了对于任意固定$k$,检测问题可在$O(n)$时间内求解,其证明为后续发现"孪生宽度"概念奠定了基础,这一概念近年来极大推动了参数化复杂度理论的发展。相比之下,计数问题更为困难:Berendsohn、Kozma与Marx[2019]给出了$n^{\Omega(k / \log k)}$的条件性下界,且由于该问题等价于图中$4$-环计数问题[Dudek与Gawrychowski, 2020],早在$k=4$时就被认为具有比检测问题更高的多项式复杂度。本文针对所有$k \leq 5$的情形,设计了一种在确定性近线性时间内计算$f$中$\sigma$副本数量的$(1+\varepsilon)$近似算法。结合$k=4$时的条件性下界,这首次建立了模式计数问题中近似算法与精确算法之间的分离关系。值得注意的是,我们的算法利用了Birgé分解——一种在分布测试中广泛使用的单调分布亚线性分析工具,据我们所知,该方法此前从未被应用于模式计数领域。