In pervasive machine learning, especially in Human Behavior Analysis (HBA), RGB has been the primary modality due to its accessibility and richness of information. However, linked with its benefits are challenges, including sensitivity to lighting conditions and privacy concerns. One possibility to overcome these vulnerabilities is to resort to different modalities. For instance, thermal is particularly adept at accentuating human forms, while depth adds crucial contextual layers. Despite their known benefits, only a few HBA-specific datasets that integrate these modalities exist. To address this shortage, our research introduces a novel generative technique for creating trimodal, i.e., RGB, thermal, and depth, human-focused datasets. This technique capitalizes on human segmentation masks derived from RGB images, combined with thermal and depth backgrounds that are sourced automatically. With these two ingredients, we synthesize depth and thermal counterparts from existing RGB data utilizing conditional image-to-image translation. By employing this approach, we generate trimodal data that can be leveraged to train models for settings with limited data, bad lightning conditions, or privacy-sensitive areas.
翻译:在普适机器学习中,尤其是在人类行为分析(HBA)领域,RGB因其易获取性和信息丰富性而成为主要模态。然而,其优势也伴随着挑战,包括对光照条件的敏感性和隐私问题。克服这些脆弱性的一种可能性是采用不同的模态。例如,热成像特别擅长突出人体形态,而深度则增加关键的上文背景层次。尽管这些模态具有已知优势,但整合了这些模态的HBA专用数据集却寥寥无几。针对这一短缺,我们的研究引入了一种新颖的生成技术,用于创建三模态(即RGB、热成像和深度)的人类聚焦数据集。该技术利用从RGB图像中提取的人体分割掩码,结合自动获取的热成像和深度背景。通过这两个要素,我们利用条件图像到图像翻译技术,从现有RGB数据中合成深度和热成像对应数据。采用这种方法,我们生成了三模态数据,可用于训练在数据有限、光照条件差或隐私敏感区域下的模型。