Deep learning approaches for heart-sound (PCG) segmentation built on time-frequency features can be accurate but often rely on large expert-labeled datasets, limiting robustness and deployment. We present TopSeg, a topological representation-centric framework that encodes PCG dynamics with multi-scale topological features and decodes them using a lightweight temporal convolutional network (TCN) with an order- and duration-constrained inference step. To evaluate data efficiency and generalization, we train exclusively on PhysioNet 2016 dataset with subject-level subsampling and perform external validation on CirCor dataset. Under matched-capacity decoders, the topological features consistently outperform spectrogram and envelope inputs, with the largest margins at low data budgets; as a full system, TopSeg surpasses representative end-to-end baselines trained on their native inputs under the same budgets while remaining competitive at full data. Ablations at 10% training confirm that all scales contribute and that combining H_0 and H_1 yields more reliable S1/S2 localization and boundary stability. These results indicate that topology-aware representations provide a strong inductive bias for data-efficient, cross-dataset PCG segmentation, supporting practical use when labeled data are limited.
翻译:基于时频特征的心音(PCG)分割深度学习方法虽然能够达到较高精度,但通常依赖于大量专家标注的数据集,这限制了其鲁棒性和实际部署。本文提出TopSeg,一个以拓扑表示为核心的框架,它利用多尺度拓扑特征对PCG动态进行编码,并通过一个轻量级时序卷积网络(TCN)结合顺序与时长约束的推理步骤进行解码。为评估数据效率和泛化能力,我们仅在PhysioNet 2016数据集上进行训练(并采用受试者级别的子采样),并在CirCor数据集上进行外部验证。在解码器容量匹配的条件下,拓扑特征始终优于频谱图和包络输入,且在低数据预算下优势最为显著;作为一个完整系统,在相同数据预算下,TopSeg超越了使用其原生输入训练的代表性端到端基线方法,并在全数据条件下仍保持竞争力。在10%训练数据下的消融实验证实,所有尺度特征均有所贡献,且结合H_0与H_1特征能够实现更可靠的S1/S2定位和边界稳定性。这些结果表明,拓扑感知的表示为数据高效、跨数据集的PCG分割提供了强有力的归纳偏置,在标注数据有限的情况下具有实际应用价值。