Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation

Yang Liu,Xiaoyun Zhong,Shiyao Zhai,Zhicheng Du,Zhenyuan Gao,Qiming Huang,Canyang Zhang,Bin Jiang,Vijay Kumar Pandey,Sanyang Han,Runming Wang,Yuxing Han,Peiwu Qin

from arxiv, Transformer for Cardiopulmonary Resuscitation

The vast majority of people who suffer unexpected cardiac arrest are performed cardiopulmonary resuscitation (CPR) by passersby in a desperate attempt to restore life, but endeavors turn out to be fruitless on account of disqualification. Fortunately, many pieces of research manifest that disciplined training will help to elevate the success rate of resuscitation, which constantly desires a seamless combination of novel techniques to yield further advancement. To this end, we collect a custom CPR video dataset in which trainees make efforts to behave resuscitation on mannequins independently in adherence to approved guidelines, thereby devising an auxiliary toolbox to assist supervision and rectification of intermediate potential issues via modern deep learning methodologies. Our research empirically views this problem as a temporal action segmentation (TAS) task in computer vision, which aims to segment an untrimmed video at a frame-wise level. Here, we propose a Prompt-enhanced hierarchical Transformer (PhiTrans) that integrates three indispensable modules, including a textual prompt-based Video Features Extractor (VFE), a transformer-based Action Segmentation Executor (ASE), and a regression-based Prediction Refinement Calibrator (PRC). The backbone of the model preferentially derives from applications in three approved public datasets (GTEA, 50Salads, and Breakfast) collected for TAS tasks, which accounts for the excavation of the segmentation pipeline on the CPR dataset. In general, we unprecedentedly probe into a feasible pipeline that genuinely elevates the CPR instruction qualification via action segmentation in conjunction with cutting-edge deep learning techniques. Associated experiments advocate our implementation with multiple metrics surpassing 91.0%.

翻译：绝大多数遭遇突发心脏骤停的人由旁观者紧急尝试心肺复苏（CPR）以挽救生命，但由于操作不熟练，这些努力往往徒劳无功。幸运的是，多项研究表明，规范训练有助于提高复苏成功率，而这不断需要新技术无缝结合以取得进一步进展。为此，我们收集了一个自定义的CPR视频数据集，其中受训者按照批准指南独立在人体模型上进行复苏操作，进而设计了一个辅助工具箱，通过现代深度学习方法协助监督和纠正中间潜在问题。我们的研究实证性地将这一问题视为计算机视觉中的时间动作分割（TAS）任务，该任务旨在从帧级别对未修剪视频进行分割。在此，我们提出了一种提示增强的分层Transformer（PhiTrans），它集成了三个不可或缺的模块，包括基于文本提示的视频特征提取器（VFE）、基于Transformer的动作分割执行器（ASE）以及基于回归的预测细化校准器（PRC）。模型主干优先源自三个用于TAS任务的公开批准数据集（GTEA、50Salads和Breakfast）的应用，这有助于挖掘CPR数据集上的分割流程。总体而言，我们开创性地探索了一个可行的流程，通过结合尖端深度学习技术的动作分割，真正提升了CPR指导的合格率。相关实验证明，我们的实现多项指标超过91.0%。