Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing Trimodal Data

In pervasive machine learning, especially in Human Behavior Analysis (HBA), RGB has been the primary modality due to its accessibility and richness of information. However, linked with its benefits are challenges, including sensitivity to lighting conditions and privacy concerns. One possibility to overcome these vulnerabilities is to resort to different modalities. For instance, thermal is particularly adept at accentuating human forms, while depth adds crucial contextual layers. Despite their known benefits, only a few HBA-specific datasets that integrate these modalities exist. To address this shortage, our research introduces a novel generative technique for creating trimodal, i.e., RGB, thermal, and depth, human-focused datasets. This technique capitalizes on human segmentation masks derived from RGB images, combined with thermal and depth backgrounds that are sourced automatically. With these two ingredients, we synthesize depth and thermal counterparts from existing RGB data utilizing conditional image-to-image translation. By employing this approach, we generate trimodal data that can be leveraged to train models for settings with limited data, bad lightning conditions, or privacy-sensitive areas.

翻译：在普适机器学习中，尤其是在人类行为分析（HBA）领域，RGB因其易获取性和信息丰富性而成为主要模态。然而，其优势也伴随着挑战，包括对光照条件的敏感性和隐私问题。克服这些脆弱性的一种可能性是采用不同的模态。例如，热成像特别擅长突出人体形态，而深度则增加关键的上文背景层次。尽管这些模态具有已知优势，但整合了这些模态的HBA专用数据集却寥寥无几。针对这一短缺，我们的研究引入了一种新颖的生成技术，用于创建三模态（即RGB、热成像和深度）的人类聚焦数据集。该技术利用从RGB图像中提取的人体分割掩码，结合自动获取的热成像和深度背景。通过这两个要素，我们利用条件图像到图像翻译技术，从现有RGB数据中合成深度和热成像对应数据。采用这种方法，我们生成了三模态数据，可用于训练在数据有限、光照条件差或隐私敏感区域下的模型。

相关内容

运动行为分析

关注 959

计算机视觉中运动行为分析就是在不需要人为干预的情况下，综合利用计算机视觉、模式识别、图像处理、人工智能等诸多方面的知识和技术对摄像机拍录的图像序列进行自动分析，实现动态场景中的人体定位、跟踪和识别，并在此基础上分析和判断人的行为，其最终目标是通过对行为特征数据的分析来获取行为的语义描述与理解。运动人体行为分析在智能视频监控、高级人机交互、视频会议、基于行为的视频检索以及医疗诊断等方面有着广泛的应用前景和潜在的商业价值，是近年来计算机视觉领域最活跃的研究方向之一。它包含视频中运动人体的自动检测、行为特征提取以及行为理解和描述等，属于图像分析和理解的范畴。从技术角度讲，人体行为分析和识别的研究内容相当丰富，涉及到图像处理、计算机视觉、模式识别、人工智能、形态学等学科知识。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日