Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two versions of Multi-Stage Temporal Convolutional Recurrent Networks (MS-TCRNet), specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Hand Inversion, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieved state-of-the-art performance.
翻译:动作分割是高层过程分析中的一项挑战性任务,通常基于视频或从各类传感器获取的运动学数据执行。本文在运动学数据的动作分割方面提出了两项贡献。首先,我们介绍了专为运动学数据设计的两种版本的多阶段时序卷积循环网络(MS-TCRNet)。该架构包含一个具有阶段内正则化的预测生成器以及基于双向LSTM或GRU的细化阶段。其次,我们提出了两种新的数据增强技术——世界坐标系旋转和手部翻转,它们利用运动学数据固有的强几何结构来提升算法性能与鲁棒性。我们在三个外科缝合任务数据集上评估了我们的模型:可变组织模拟(VTS)数据集、新引入的肠道修复模拟(BRS)数据集(二者均为我们收集的开放手术模拟数据集),以及机器人手术领域知名基准测试集JHU-ISI手势与技能评估工作集(JIGSAWS)。我们的方法取得了最先进的性能。