From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Action understanding has attracted long-term attention. It can be formed as the mapping from the physical space to the semantic space. Typically, researchers built datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Datasets are incompatible with each other like "Isolated Islands" due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that we need a more principled semantic space to concentrate the community efforts and use all datasets together to pursue generalizable action learning. To this end, we design a structured action semantic space given verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging "isolated islands" into a "Pangea". Accordingly, we propose a novel model mapping from the physical space to semantic space to fully use Pangea. In extensive experiments, our new system shows significant superiority, especially in transfer learning. Our code and data will be made public at https://mvig-rhos.com/pangea.

翻译：动作理解长期以来受到关注，其可形式化为从物理空间到语义空间的映射。通常，研究者根据各自的选择构建数据集以定义类别并分别推动基准测试的发展。由于语义鸿沟和类别粒度差异（例如数据集A中的“做家务”与数据集B中的“洗盘子”），各数据集如“孤立岛屿”般互不兼容。我们认为需要更规范的语义空间来凝聚学界力量，并整合所有数据集以实现可泛化的动作学习。为此，我们基于动词分类层级结构设计了一个结构化动作语义空间，覆盖海量动作类别。通过将现有数据集的类别对齐到该语义空间，我们将（图像/视频/骨骼/运动捕捉）数据集整合到统一标签系统的数据库中，即架设“孤立岛屿”连成“联合大陆”。在此基础上，我们提出从物理空间映射到语义空间的新型模型以充分利用联合大陆。大量实验表明，新系统展现出显著优越性，尤其在迁移学习方面。我们的代码与数据将发布于https://mvig-rhos.com/pangea。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日