The Cartesian tree of a sequence captures the relative order of the sequence's elements. In recent years, Cartesian tree matching has attracted considerable attention, particularly due to its applications in time series analysis. Consider a text $T$ of length $n$ and a pattern $P$ of length $m$. In the exact Cartesian tree matching problem, the task is to find all length-$m$ fragments of $T$ whose Cartesian tree coincides with the Cartesian tree $CT(P)$ of the pattern. Although the exact version of the problem can be solved in linear time [Park et al., TCS 2020], it remains rather restrictive; for example, it is not robust to outliers in the pattern. To overcome this limitation, we consider the approximate setting, where the goal is to identify all fragments of $T$ that are close to some string whose Cartesian tree matches $CT(P)$. In this work, we quantify closeness via the widely used Hamming distance metric. For a given integer parameter $k>0$, we present an algorithm that computes all fragments of $T$ that are at Hamming distance at most $k$ from a string whose Cartesian tree matches $CT(P)$. Our algorithm runs in time $\mathcal O(n \sqrt{m} \cdot k^{2.5})$ for $k \leq m^{1/5}$ and in time $\mathcal O(nk^5)$ for $k \geq m^{1/5}$, thereby improving upon the state-of-the-art $\mathcal O(nmk)$-time algorithm of Kim and Han [TCS 2025] in the regime $k = o(m^{1/4})$. On the way to our solution, we develop a toolbox of independent interest. First, we introduce a new notion of periodicity in Cartesian trees. Then, we lift multiple well-known combinatorial and algorithmic results for string matching and periodicity in strings to Cartesian tree matching and periodicity in Cartesian trees.
翻译:序列的笛卡尔树刻画了序列元素间的相对顺序。近年来,笛卡尔树匹配因其在时间序列分析等领域的应用而受到广泛关注。给定长度为 $n$ 的文本 $T$ 和长度为 $m$ 的模式 $P$,在精确笛卡尔树匹配问题中,目标是找出 $T$ 中所有长度为 $m$ 且其笛卡尔树与模式笛卡尔树 $CT(P)$ 完全一致的片段。尽管该问题的精确版本可在线性时间内求解 [Park 等人, TCS 2020],但其限制性较强,例如对模式中的异常值不够鲁棒。为克服这一局限,我们考虑近似匹配场景,其目标是识别 $T$ 中所有与某个笛卡尔树匹配 $CT(P)$ 的字符串相近的片段。本工作中,我们采用广泛使用的汉明距离度量来量化这种近似程度。对于给定的整数参数 $k>0$,我们提出一种算法,用于计算 $T$ 中所有与某个笛卡尔树匹配 $CT(P)$ 的字符串的汉明距离不超过 $k$ 的片段。当 $k \leq m^{1/5}$ 时,算法时间复杂度为 $\mathcal O(n \sqrt{m} \cdot k^{2.5})$;当 $k \geq m^{1/5}$ 时,时间复杂度为 $\mathcal O(nk^5)$。该结果在 $k = o(m^{1/4})$ 范围内优于 Kim 和 Han [TCS 2025] 提出的当前最优 $\mathcal O(nmk)$ 时间复杂度算法。在求解过程中,我们发展了一套具有独立价值的工具集。首先,我们引入了笛卡尔树周期性的一种新定义。随后,我们将字符串匹配与字符串周期性中多个经典的组合性质与算法结论,提升至笛卡尔树匹配与笛卡尔树周期性的框架中。