We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with deep learning architectures, they provide an intuitive, easy-to-use loss function for multi-label classification. A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes, necessitating the use of Gibbs sampling. To address this challenge, we propose novel inference schemes based on local search and integer linear programming for computing the most likely assignment to the labels given observations. We evaluate our novel methods on three video datasets (Charades, TACoS, Wetlab) and three image datasets (MS-COCO, PASCAL VOC, NUS-WIDE), comparing their performance with (a) basic neural architectures and (b) neural architectures combined with Markov networks equipped with advanced inference and learning techniques. Our results demonstrate the superiority of our new DDN methods over the two competing approaches.
翻译:我们提出一个统一框架——深度依赖网络(DDNs),该框架融合了依赖网络与深度学习架构,专门针对多标签分类任务,特别聚焦于图像与视频数据。与马尔可夫网络等其他概率图模型相比,依赖网络的主要优势在于其训练简便性。特别是当与深度学习架构结合时,它能提供一种直观且易用的多标签分类损失函数。然而,相较于马尔可夫网络,DDNs的一个缺陷是缺乏高级推理方案,这迫使必须采用吉布斯采样。为解决这一挑战,我们提出了基于局部搜索与整数线性规划的新型推理方案,用于在给定观测数据下计算标签的最可能赋值。我们在三个视频数据集(Charades、TACoS、Wetlab)和三个图像数据集(MS-COCO、PASCAL VOC、NUS-WIDE)上评估了这些新方法,并将其性能与(a)基础神经网络架构和(b)结合了高级推理与学习技术的马尔可夫网络的神经网络架构进行了对比。实验结果表明,我们提出的新型DDN方法优于这两种竞争方法。