In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizability in retrieving relevant prior art. Moreover, this task inherently involves the challenges posed by the abstract visual features of patent images, the skewed distribution of image classifications, and the semantic information of image descriptions. Therefore, we propose a language-informed, distribution-aware multimodal approach to patent image feature learning, which enriches the semantic understanding of patent image by integrating Large Language Models and improves the performance of underrepresented classes with our proposed distribution-aware contrastive losses. Extensive experiments on DeepPatent2 dataset show that our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%. Furthermore, through an in-depth user analysis, we explore our model in aiding patent professionals in their image retrieval efforts, highlighting the model's real-world applicability and effectiveness.
翻译:在专利审查中,基于图像的检索系统对于识别当前专利图像与现有技术之间的相似性至关重要,以确保专利申请的新颖性和非显而易见性。尽管近年此类系统日益普及,但现有方法虽能有效识别同一专利内的图像,却因在检索相关现有技术时泛化能力有限而无法提供实际价值。此外,该任务天然面临专利图像抽象视觉特征、图像分类分布不均衡以及图像描述语义信息等多重挑战。为此,我们提出一种语言引导且分布感知的多模态专利图像特征学习方法——通过融合大语言模型增强专利图像的语义理解,并利用所提出的分布感知对比损失提升低表征类别的性能。在DeepPatent2数据集上的大量实验表明,所提方法在基于图像的专利检索中达到最优或可比性能:mAP提升53.3%、Recall@10提升41.8%、MRR@10提升51.9%。进一步通过深度用户分析,我们探索了该模型在辅助专利专业人员进行图像检索中的实际应用效果,验证了其现实适用性与有效性。