Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization

Predicting human scanpaths when exploring panoramic videos is a challenging task due to the spherical geometry and the multimodality of the input, and the inherent uncertainty and diversity of the output. Most previous methods fail to give a complete treatment of these characteristics, and thus are prone to errors. In this paper, we present a simple new criterion for scanpath prediction based on principles from lossy data compression. This criterion suggests minimizing the expected code length of quantized scanpaths in a training set, which corresponds to fitting a discrete conditional probability model via maximum likelihood. Specifically, the probability model is conditioned on two modalities: a viewport sequence as the deformation-reduced visual input and a set of relative historical scanpaths projected onto respective viewports as the aligned path input. The probability model is parameterized by a product of discretized Gaussian mixture models to capture the uncertainty and the diversity of scanpaths from different users. Most importantly, the training of the probability model does not rely on the specification of "ground-truth" scanpaths for imitation learning. We also introduce a proportional-integral-derivative (PID) controller-based sampler to generate realistic human-like scanpaths from the learned probability model. Experimental results demonstrate that our method consistently produces better quantitative scanpath results in terms of prediction accuracy (by comparing to the assumed "ground-truths") and perceptual realism (through machine discrimination) over a wide range of prediction horizons. We additionally verify the perceptual realism improvement via a formal psychophysical experiment and the generalization improvement on several unseen panoramic video datasets.

翻译：预测人类在全景视频中的扫描路径是一项具有挑战性的任务，原因在于球面几何与输入的多模态特性，以及输出固有的不确定性和多样性。以往的大多数方法未能对这些特性进行完整处理，因此容易产生误差。本文基于有损数据压缩原理，提出了一种简洁的扫描路径预测新准则。该准则建议最小化训练集中量化扫描路径的期望码长，这相当于通过极大似然估计拟合一个离散条件概率模型。具体而言，该概率模型以两种模态为条件：作为降变形视觉输入的视口序列，以及作为对齐路径输入的投影到各自视口的相对历史扫描路径集。概率模型由离散化高斯混合模型的乘积参数化，以捕捉不同用户扫描路径的不确定性和多样性。最重要的是，该概率模型的训练不依赖于指定"真实"扫描路径进行模仿学习。我们还引入了一种基于比例-积分-微分（PID）控制器的采样器，从学习到的概率模型中生成逼真的人眼扫描路径。实验结果表明，在广泛的预测时域内，我们的方法在预测精度（通过与假设的"真实值"对比）和感知真实性（通过机器判别）方面均能持续生成更优的量化扫描路径结果。我们进一步通过正式的心理物理学实验验证了感知真实性的提升，并在多个未见过的全景视频数据集上验证了泛化能力的提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日