In this paper, we propose a new architecture called Adaptive Graphical Model Network (AGMN) to tackle the task of 2D hand pose estimation from a monocular RGB image. The AGMN consists of two branches of deep convolutional neural networks for calculating unary and pairwise potential functions, followed by a graphical model inference module for integrating unary and pairwise potentials. Unlike existing architectures proposed to combine DCNNs with graphical models, our AGMN is novel in that the parameters of its graphical model are conditioned on and fully adaptive to individual input images. Experiments show that our approach outperforms the state-of-the-art method used in 2D hand keypoints estimation by a notable margin on two public datasets. Code can be found at https://github.com/deyingk/agmn.
翻译:本文提出了一种名为自适应图模型网络(AGMN)的新架构,以解决从单目RGB图像进行二维手部姿态估计的任务。AGMN包含两个深度卷积神经网络分支,分别用于计算一元势函数和成对势函数,随后通过图模型推理模块整合一元势与成对势。与现有将深度卷积神经网络与图模型结合的架构不同,我们的AGMN创新之处在于其图模型参数取决于单个输入图像,并完全自适应于该图像。实验表明,在两个公开数据集上,我们的方法在二维手部关键点估计任务中显著优于现有最佳方法。代码地址:https://github.com/deyingk/agmn。