Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. In the context of surgical procedures, action segmentation is critical for workflow analysis algorithms. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two multi-stage architectures, MS-TCN-BiLSTM and MS-TCN-BiGRU, specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Horizontal-Flip, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieve state-of-the-art performance on all benchmark datasets and establish a strong baseline for the BRS dataset.
翻译:动作分割是高级过程分析中的一项具有挑战性的任务,通常基于从各类传感器获取的视频或运动学数据执行。在外科手术过程中,动作分割对于工作流程分析算法至关重要。本文针对运动学数据的动作分割问题提出两点贡献。首先,我们引入了两种专为运动学数据设计的多阶段架构——MS-TCN-BiLSTM和MS-TCN-BiGRU。这些架构包含一个带有阶段内正则化的预测生成器,以及基于双向LSTM或双向GRU的精化阶段。其次,我们提出了两种新的数据增强技术——世界坐标系旋转和水平翻转,利用运动学数据强几何结构提升算法性能与鲁棒性。我们在三个手术缝合任务数据集上评估模型:可变组织模拟(VTS)数据集和新引入的肠道修复模拟(BRS)数据集(两者均为我们采集的开放手术模拟数据集),以及机器人手术领域的知名基准JHU-ISI手势与技能评估工作集(JIGSAWS)。我们的方法在所有基准数据集上均达到最先进性能,并为BRS数据集建立了强基线。