This study uses domain randomization to generate a synthetic RGB-D dataset for training multimodal instance segmentation models, aiming to achieve colour-agnostic hand localization in cluttered industrial environments. Domain randomization is a simple technique for addressing the "reality gap" by randomly rendering unrealistic features in a simulation scene to force the neural network to learn essential domain features. We provide a new synthetic dataset for various hand detection applications in industrial environments, as well as ready-to-use pretrained instance segmentation models. To achieve robust results in a complex unstructured environment, we use multimodal input that includes both colour and depth information, which we hypothesize helps to improve the accuracy of the model prediction. In order to test this assumption, we analyze the influence of each modality and their synergy. The evaluated models were trained solely on our synthetic dataset; yet we show that our approach enables the models to outperform corresponding models trained on existing state-of-the-art datasets in terms of Average Precision and Probability-based Detection Quality.
翻译:本研究采用域随机化技术生成合成RGB-D数据集,用于训练多模态实例分割模型,旨在实现杂乱工业环境中与颜色无关的手部定位。域随机化是一种通过随机渲染仿真场景中的非真实特征来迫使神经网络学习关键域特征的简单方法,用于解决“现实差距”问题。我们为工业环境中的各种手部检测应用提供了一个新的合成数据集,以及可直接使用的预训练实例分割模型。为了在复杂的非结构化环境中获得稳健的结果,我们采用了包含颜色和深度信息的多模态输入,假设这有助于提高模型预测的准确性。为验证这一假设,我们分析了每种模态及其协同作用的影响。所评估的模型仅在我们的合成数据集上进行训练,但结果表明,我们的方法使这些模型在平均精度和基于概率的检测质量指标上优于在现有最先进数据集上训练的对应模型。