Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.
翻译:随着每年专利申请数量的快速增长,促进专利探索与检索的信息与多媒体检索方法至关重要。专利中采用不同类型的可视化方式(如图表、技术图纸)和视角(如侧视图、透视图)来展示创新细节。对这些图像进行分类能够实现更高效的检索,并为进一步分析提供支持。目前,图像类型分类数据集缺少部分专利中重要的可视化类型。此外,相关研究尚未利用包括Transformer在内的最新深度学习方法。本文采用最先进的深度学习方法,对专利图像中的可视化类型与视角进行分类。我们将用于专利图像类型分类的CLEF-IP数据集扩展至十类,并提供人工标注的真实数据。此外,我们从提供弱标注图像视角的数据集中推导出一组层次化类别。实验结果表明了所提出方法的可行性。源代码、模型及数据集将公开发布。