In object detection, non-maximum suppression (NMS) methods are extensively adopted to remove horizontal duplicates of detected dense boxes for generating final object instances. However, due to the degraded quality of dense detection boxes and not explicit exploration of the context information, existing NMS methods via simple intersection-over-union (IoU) metrics tend to underperform on multi-oriented and long-size objects detection. Distinguishing with general NMS methods via duplicate removal, we propose a novel graph fusion network, named GFNet, for multi-oriented object detection. Our GFNet is extensible and adaptively fuse dense detection boxes to detect more accurate and holistic multi-oriented object instances. Specifically, we first adopt a locality-aware clustering algorithm to group dense detection boxes into different clusters. We will construct an instance sub-graph for the detection boxes belonging to one cluster. Then, we propose a graph-based fusion network via Graph Convolutional Network (GCN) to learn to reason and fuse the detection boxes for generating final instance boxes. Extensive experiments both on public available multi-oriented text datasets (including MSRA-TD500, ICDAR2015, ICDAR2017-MLT) and multi-oriented object datasets (DOTA) verify the effectiveness and robustness of our method against general NMS methods in multi-oriented object detection.
翻译:在目标检测中,非极大值抑制(NMS)方法被广泛用于移除检测密集框的水平重复,以生成最终目标实例。然而,由于密集检测框质量下降及未显式探索上下文信息,现有通过简单交并比(IoU)度量实现的NMS方法在多朝向及长条形目标检测中表现欠佳。不同于通过重复移除实现的通用NMS方法,我们提出一种新颖的图融合网络GFNet,用于多朝向目标检测。所提GFNet具有可扩展性,能够自适应融合密集检测框,以检测更准确、更完整的多朝向目标实例。具体而言,我们首先采用一种局部感知聚类算法,将密集检测框划分为不同簇;并为属于同一簇的检测框构建实例子图。随后,我们提出一种基于图卷积网络(GCN)的图融合网络,通过学习推理并融合检测框,以生成最终实例框。在公开的多朝向文本数据集(包括MSRA-TD500、ICDAR2015、ICDAR2017-MLT)及多朝向目标数据集(DOTA)上的大量实验,验证了所提方法在多朝向目标检测中相较于通用NMS方法的有效性与鲁棒性。