Dynamic scene understanding is one of the most conspicuous field of interest among computer vision community. In order to enhance dynamic scene understanding, pixel-wise segmentation with neural networks is widely accepted. The latest researches on pixel-wise segmentation combined semantic and motion information and produced good performance. In this work, we propose a state of art architecture of neural networks to accurately and efficiently get the moving object proposals (MOP). We first train an unsupervised convolutional neural network (UnFlow) to generate optical flow estimation. Then we render the output of optical flow net to a fully convolutional SegNet model. The main contribution of our work is (1) Fine-tuning the pretrained optical flow model on the brand new DAVIS Dataset; (2) Leveraging fully convolutional neural networks with Encoder-Decoder architecture to segment objects. We developed the codes with TensorFlow, and executed the training and evaluation processes on an AWS EC2 instance.
翻译:动态场景理解是计算机视觉领域最引人关注的研究方向之一。为提升动态场景理解能力,基于神经网络的像素级分割方法已被广泛采用。最新研究通过融合语义与运动信息实现了像素级分割的优秀性能。本文提出一种新型神经网络架构,旨在准确高效地获取运动目标提案(MOP)。首先通过无监督卷积神经网络(UnFlow)训练生成光流估计,随后将光流网络的输出馈入全卷积SegNet模型。本工作的主要贡献在于:(1)基于全新DAVIS数据集对预训练光流模型进行微调;(2)利用编码器-解码器架构的全卷积神经网络实现目标分割。我们采用TensorFlow开发代码,并在AWS EC2实例上完成训练与评估流程。