Point Tracking as a Temporal Cue for Robust Myocardial Segmentation in Echocardiography Videos

Purpose: Myocardium segmentation in echocardiography videos is a challenging task due to low contrast, noise, and anatomical variability. Traditional deep learning models either process frames independently, ignoring temporal information, or rely on memory-based feature propagation, which accumulates error over time. Methods: We propose Point-Seg, a transformer-based segmentation framework that integrates point tracking as a temporal cue to ensure stable and consistent segmentation of myocardium across frames. Our method leverages a point-tracking module trained on a synthetic echocardiography dataset to track key anatomical landmarks across video sequences. These tracked trajectories provide an explicit motion-aware signal that guides segmentation, reducing drift and eliminating the need for memory-based feature accumulation. Additionally, we incorporate a temporal smoothing loss to further enhance temporal consistency across frames. Results: We evaluate our approach on both public and private echocardiography datasets. Experimental results demonstrate that Point-Seg has statistically similar accuracy in terms of Dice to state-of-the-art segmentation models in high quality echo data, while it achieves better segmentation accuracy in lower quality echo with improved temporal stability. Furthermore, Point-Seg has the key advantage of pixel-level myocardium motion information as opposed to other segmentation methods. Such information is essential in the computation of other downstream tasks such as myocardial strain measurement and regional wall motion abnormality detection. Conclusion: Point-Seg demonstrates that point tracking can serve as an effective temporal cue for consistent video segmentation, offering a reliable and generalizable approach for myocardium segmentation in echocardiography videos. The code is available at https://github.com/DeepRCL/PointSeg.

翻译：目的：超声心动图视频中的心肌分割是一项具有挑战性的任务，原因在于图像对比度低、噪声大以及解剖结构存在变异性。传统的深度学习模型要么独立处理各帧图像而忽略时间信息，要么依赖基于记忆的特征传播，这会导致误差随时间累积。方法：我们提出Point-Seg，一种基于Transformer的分割框架，它将点跟踪作为时间线索进行整合，以确保心肌在连续帧中实现稳定且一致的分割。我们的方法利用在合成超声心动图数据集上训练的点跟踪模块，来追踪视频序列中的关键解剖标志点。这些被跟踪的轨迹提供了显式的运动感知信号，用以指导分割过程，从而减少漂移现象并消除对基于记忆的特征累积的需求。此外，我们引入了一种时间平滑损失，以进一步增强帧间的时间一致性。结果：我们在公开和私有的超声心动图数据集上评估了所提出的方法。实验结果表明，在高质量超声数据上，Point-Seg在Dice系数方面与最先进的分割模型具有统计学上相似的准确性；而在质量较低的超声数据上，Point-Seg则实现了更好的分割准确性和更高的时间稳定性。此外，与其他分割方法相比，Point-Seg具有一个关键优势，即能提供像素级的心肌运动信息。这类信息对于计算其他下游任务（如心肌应变测量和局部室壁运动异常检测）至关重要。结论：Point-Seg证明了点跟踪可以作为实现一致视频分割的有效时间线索，为超声心动图视频中的心肌分割提供了一种可靠且可推广的方法。代码可在 https://github.com/DeepRCL/PointSeg 获取。