Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

We endeavor on a rarely explored task named Insubstantial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and temporal information is crucial. Thus, we construct an IOD-Video dataset comprised of 600 videos (141,017 frames) covering various distances, sizes, visibility, and scenes captured by different spectral ranges. In addition, we develop a spatio-temporal aggregation framework for IOD, in which different backbones are deployed and a spatio-temporal aggregation loss (STAloss) is elaborately designed to leverage the consistency along the time axis. Experiments conducted on IOD-Video dataset demonstrate that spatio-temporal aggregation can significantly improve the performance of IOD. We hope our work will attract further researches into this valuable yet challenging task. The code will be available at: \url{https://github.com/CalayZhou/IOD-Video}.

翻译：我们致力于研究一项鲜有探索的任务——无实形物体检测（Insubstantial Object Detection, IOD），该任务旨在定位具有以下特征的物体：（1）形状模糊、边界不清；（2）与环境高度相似；（3）缺乏色彩表现。因此，在单一静态帧中区分无实形物体极具挑战性，而空间与时间信息的协同表达至关重要。为此，我们构建了IOD-Video数据集，包含600个视频（共141,017帧），覆盖了不同距离、尺寸、可见度及由不同光谱范围捕获的场景。此外，我们提出了一种面向IOD的时空聚合框架，其中部署了多种骨干网络，并精心设计了时空聚合损失（STAloss）以利用时间轴上的连续性。在IOD-Video数据集上的实验表明，时空聚合能显著提升IOD性能。我们希望这项工作能吸引更多研究者关注这一有价值且充满挑战性的任务。代码将发布在：\url{https://github.com/CalayZhou/IOD-Video}。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日