This PhD. Thesis concerns the study and development of hierarchical representations for spatio-temporal visual attention modeling and understanding in video sequences. More specifically, we propose two computational models for visual attention. First, we present a generative probabilistic model for context-aware visual attention modeling and understanding. Secondly, we develop a deep network architecture for visual attention modeling, which first estimates top-down spatio-temporal visual attention, and ultimately serves for modeling attention in the temporal domain.
翻译:本博士论文关注于视频序列中时空视觉注意力建模与理解的分层表示的研究与开发。具体而言,我们提出了两种视觉注意力的计算模型。首先,我们提出了一种用于上下文感知的视觉注意力建模与理解的生成式概率模型。其次,我们开发了一种用于视觉注意力建模的深度网络架构,该架构首先估计自上而下的时空视觉注意力,并最终服务于时间域中的注意力建模。