In this technical report, we present our findings from the research conducted on the Human-Object Interaction 4D (HOI4D) dataset for egocentric action segmentation task. As a relatively novel research area, point cloud video methods might not be good at temporal modeling, especially for long point cloud videos (\eg, 150 frames). In contrast, traditional video understanding methods have been well developed. Their effectiveness on temporal modeling has been widely verified on many large scale video datasets. Therefore, we convert point cloud videos into depth videos and employ traditional video modeling methods to improve 4D action segmentation. By ensembling depth and point cloud video methods, the accuracy is significantly improved. The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.
翻译:在本技术报告中,我们介绍了针对人机交互4D(HOI4D)数据集进行自我中心动作分割任务的研究发现。作为一个相对新颖的研究领域,点云视频方法在时序建模方面可能表现不佳,尤其对于长序列点云视频(例如150帧)。相比之下,传统视频理解方法已发展成熟,其时序建模的有效性已在多个大规模视频数据集上得到广泛验证。因此,我们将点云视频转换为深度视频,并采用传统视频建模方法以改进4D动作分割。通过集成深度与点云视频方法,准确率得到显著提升。所提出的方法——名为深度与点云视频专家混合模型(DPMix)——在2023年HOI4D挑战赛的4D动作分割赛道中荣获第一名。