Clustering trajectories is a central challenge when confronted with large amounts of movement data such as full-body motion data or GPS data. We study a clustering problem that can be stated as a geometric set cover problem: Given a polygonal curve of complexity $n$, find the smallest number $k$ of representative trajectories of complexity at most $l$ such that any point on the input trajectories lies on a subtrajectory of the input that has Fr\'echet distance at most $\Delta$ to one of the representative trajectories. This problem was first studied by Akitaya et al. (2021) and Br\"uning et al. (2022). They present a bicriteria approximation algorithm that returns a set of curves of size $O(kl\log(kl))$ which covers the input with a radius of $11\Delta$ in time $\widetilde{O}((kl)^2n + kln^3)$, where $k$ is the smallest number of curves of complexity $l$ needed to cover the input with a distance of $\Delta$. The representative trajectories computed by their algorithm are always line segments. In applications however, one is usually interested in representative curves of higher complexity which consist of several edges. We present a new approach that builds upon the works of Br\"uning et al. (2022) computing a set of curves of size $O(k\log(n))$ in time $\widetilde{O}(l^2n^4 + kln^4)$ with the same distance guarantee of $11\Delta$, where each curve may consist of curves of complexity up to the given complexity parameter $l$. To validate our approach, we conduct experiments on different types of real world data: high-dimensional full-body motion data and low-dimensional GPS-tracking data.
翻译:聚类轨迹是处理大量运动数据(如全身运动数据或GPS数据)时面临的核心挑战。我们研究了一个可表述为几何集合覆盖问题的聚类问题:给定一条复杂度为$n$的多边形曲线,找到最少数量$k$条复杂度不超过$l$的代表性轨迹,使得输入轨迹上的任意点都位于输入轨迹的一个子轨迹上,该子轨迹与某条代表性轨迹的弗雷歇距离至多为$\Delta$。该问题最早由Akitaya等人(2021)和Brüning等人(2022)研究。他们提出了一种双目标近似算法,该算法在$\widetilde{O}((kl)^2n + kln^3)$时间内返回一组规模为$O(kl\log(kl))$的曲线,以$11\Delta$的半径覆盖输入,其中$k$是以距离$\Delta$覆盖输入所需的最少复杂度为$l$的曲线数量。他们的算法计算出的代表性轨迹始终是线段。然而在实际应用中,人们通常更关注由多个线段组成的高复杂度代表性曲线。我们提出了一种新方法,该方法基于Brüning等人(2022)的工作,在$\widetilde{O}(l^2n^4 + kln^4)$时间内计算一组规模为$O(k\log(n))$的曲线,且具有相同的$11\Delta$距离保证,其中每条曲线可由复杂度不超过给定参数$l$的曲线组成。为验证我们的方法,我们在不同类型的真实世界数据上进行了实验:高维全身运动数据和低维GPS追踪数据。