Due to the increasing need for effective security measures and the integration of cameras in commercial products, a hugeamount of visual data is created today. Law enforcement agencies (LEAs) are inspecting images and videos to findradicalization, propaganda for terrorist organizations and illegal products on darknet markets. This is time consuming.Instead of an undirected search, LEAs would like to adapt to new crimes and threats, and focus only on data from specificlocations, persons or objects, which requires flexible interpretation of image content. Visual concept detection with deepconvolutional neural networks (CNNs) is a crucial component to understand the image content. This paper has fivecontributions. The first contribution allows image-based geo-localization to estimate the origin of an image. CNNs andgeotagged images are used to create a model that determines the location of an image by its pixel values. The secondcontribution enables analysis of fine-grained concepts to distinguish sub-categories in a generic concept. The proposedmethod encompasses data acquisition and cleaning and concept hierarchies. The third contribution is the recognition ofperson attributes (e.g., glasses or moustache) to enable query by textual description for a person. The person-attributeproblem is treated as a specific sub-task of concept classification. The fourth contribution is an intuitive image annotationtool based on active learning. Active learning allows users to define novel concepts flexibly and train CNNs with minimalannotation effort. The fifth contribution increases the flexibility for LEAs in the query definition by using query expansion.Query expansion maps user queries to known and detectable concepts. Therefore, no prior knowledge of the detectableconcepts is required for the users. The methods are validated on data with varying locations (popular and non-touristiclocations), varying person attributes (CelebA dataset), and varying number of annotations.
翻译:随着安全措施需求的日益增长以及摄像头在商业产品中的普及,如今每天都会产生海量的视觉数据。执法机构通过检查图像和视频来识别暗网市场中宣扬极端化、恐怖组织宣传及非法产品的信息。这一过程耗时巨大。相较于无目的性的搜索,执法机构更希望针对新型犯罪和威胁快速适应,仅聚焦于特定地点、人物或物体的数据,这要求对图像内容进行灵活解读。基于深度卷积神经网络的视觉概念检测是理解图像内容的关键技术。本文共有五项贡献:第一项贡献实现了基于图像的地理定位,可估算图像来源地。通过利用卷积神经网络和地理标记图像建立模型,该模型能根据像素值确定图像拍摄地点。第二项贡献实现了细粒度概念分析,可区分通用概念中的子类别。所提方法包含数据采集清洗与概念层级体系。第三项贡献是识别人员属性(如眼镜或胡须),支持通过文字描述查询特定人员。该问题被视作概念分类的特殊子任务。第四项贡献是基于主动学习的直观图像标注工具。主动学习支持用户灵活定义新概念,并以最少标注工作量训练卷积神经网络。第五项贡献通过查询扩展增强执法机构定义查询的灵活性。查询扩展将用户查询映射至已知且可检测的概念,用户无需预先了解可检测概念。这些方法已在包含不同地点(热门与非旅游景点)、不同人员属性(CelebA数据集)及不同标注数量的数据上完成验证。