WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning

To date, the widely adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where users' fixations are collected while wearing an HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist "blind zooms" when using HMD to collect fixations since the users cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the "blind zooms". Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance - the main purpose of fixations - of complex panoptic scenes. To conquer, this paper introduces the auxiliary window with a dynamic blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is able to well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Specifically, since using WinDB to collect fixations is blind zoom free, there exists frequent and intensive "fixation shifting" - a very special phenomenon that has long been overlooked by the previous research - in our new set. Thus, we present an effective fixation shifting network (FishNet) to conquer it. All these new fixation collection tool, dataset, and network could be very potential to open a new age for fixation-related research and applications in 360o environments.

翻译：迄今为止，在全景视频中采集注视点的主流方法是基于头戴式显示设备（HMD），即用户佩戴HMD自由探索全景场景以完成注视点采集。然而，这种广泛使用的数据采集方法并不足以训练深度模型准确预测当全景场景中出现间歇性显著事件时哪些区域最为重要。主要原因在于，使用HMD采集注视点时始终存在"盲区缩放"现象——用户无法持续转动头部探索整个全景场景。因此，采集的注视点往往局限于局部视野，其余区域则成为"盲区缩放"区域。由此，基于HMD方法累积局部视角采集的注视点数据无法准确反映复杂全景场景的整体全局重要性——而这正是注视点数据的主要目标。为解决这一问题，本文提出基于动态模糊辅助窗口（WinDB）的全景视频注视点采集方法，该方法无需HMD设备，并能有效反映区域级重要程度。通过WinDB方法，我们发布了全新PanopticVideo-300数据集，包含300个覆盖225个类别的全景视频片段。特别值得关注的是，由于WinDB采集方法消除了盲区缩放现象，该数据集中存在频繁且密集的"注视点转移"现象——这一长期被既往研究忽视的特殊现象。为此，我们提出高效的注视点转移网络（FishNet）来应对该挑战。这一全新的注视点采集工具、数据集及网络将极有可能开创360°环境中注视点相关研究与应用的新纪元。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日