Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort. Our code can be accessed from https://github.com/barisbatuhan/DASS_Detector.
翻译:绘画是图像抽象与沟通的有力手段。理解包括数字艺术、卡通和漫画在内的多样化绘画形式,一直是计算机视觉与计算机图形学领域的重要研究课题。尽管漫画书和卡通中包含大量数字化绘画,但这些作品存在巨大的风格差异,导致训练领域特定的识别器需要昂贵的手工标注。本研究展示了如何基于教师-学生网络及改进的学生网络更新设计,通过自监督学习构建人脸与身体检测器。当只有少量目标域数据被标注时,我们的框架能够利用大量无标注的目标域数据。进一步地,我们证明可将风格迁移融入学习流程,利用海量来自自然图像(即真实世界图像)的域外标注数据来引导检测器的构建。我们的联合架构在仅需极少标注工作量的前提下,实现了最优(SOTA)及接近最优的性能。代码可从 https://github.com/barisbatuhan/DASS_Detector 获取。