Patent classification aims to assign multiple International Patent Classification (IPC) codes to a given patent. Recent methods for automatically classifying patents mainly focus on analyzing the text descriptions of patents. However, apart from the texts, each patent is also associated with some assignees, and the knowledge of their applied patents is often valuable for classification. Furthermore, the hierarchical taxonomy formulated by the IPC system provides important contextual information and enables models to leverage the correlations between IPC codes for more accurate classification. However, existing methods fail to incorporate the above aspects. In this paper, we propose an integrated framework that comprehensively considers the information on patents for patent classification. To be specific, we first present an IPC codes correlations learning module to derive their semantic representations via adaptively passing and aggregating messages within the same level and across different levels along the hierarchical taxonomy. Moreover, we design a historical application patterns learning component to incorporate the corresponding assignee's previous patents by a dual channel aggregation mechanism. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions. Experiments on real-world datasets demonstrate the superiority of our approach over the existing methods. Besides, we present the model's ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.
翻译:专利分类旨在为给定专利分配多个国际专利分类(IPC)代码。近期自动分类方法主要集中于分析专利文本描述,但除了文本外,每项专利还关联着若干受让人,这些受让人所申请专利的知识对分类往往具有重要价值。此外,由IPC系统构建的分层结构提供了关键的上下文信息,使模型能够利用IPC代码间的关联性实现更精准的分类。然而,现有方法未能整合上述要素。本文提出一个综合框架,全面考虑专利分类中的多维度信息。具体而言,我们首先设计IPC代码相关性学习模块,通过沿分层结构自适应传递与聚合同层级及跨层级信息,推导代码的语义表示;其次,构建历史申请模式学习组件,通过双通道聚合机制整合相关受让人的历史专利信息;最后,融合包含IPC代码语义的专利文本上下文信息与受让人的序列偏好进行预测。在真实数据集上的实验表明,本方法优于现有方案。此外,我们展示了模型捕捉受让人时间变化模式及IPC代码间语义依赖关系的能力。