LLM-powered Real-time Patent Citation Recommendation for Financial Technologies

Rapid financial innovation has been accompanied by a sharp increase in patenting activity, making timely and comprehensive prior-art discovery more difficult. This problem is especially evident in financial technologies, where innovations develop quickly, patent collections grow continuously, and citation recommendation systems must be updated as new applications arrive. Existing patent retrieval and citation recommendation methods typically rely on static indexes or periodic retraining, which limits their ability to operate effectively in such dynamic settings. In this study, we propose a real-time patent citation recommendation framework designed for large and fast-changing financial patent corpora. Using a dataset of 428,843 financial patents granted by the China National Intellectual Property Administration (CNIPA) between 2000 and 2024, we build a three-stage recommendation pipeline. The pipeline uses large language model (LLM) embeddings to represent the semantic content of patent abstracts, applies efficient approximate nearest-neighbor search to construct a manageable candidate set, and ranks candidates by semantic similarity to produce top-k citation recommendations. In addition to improving recommendation accuracy, the proposed framework directly addresses the dynamic nature of patent systems. By using an incremental indexing strategy based on hierarchical navigable small-world (HNSW) graphs, newly issued patents can be added without rebuilding the entire index. A rolling day-by-day update experiment shows that incremental updating improves recall while substantially reducing computational cost compared with rebuild-based indexing. The proposed method also consistently outperforms traditional text-based baselines and alternative nearest-neighbor retrieval approaches.

翻译：金融创新的快速发展伴随着专利申请活动的急剧增加，使得及时、全面的现有技术发现变得更加困难。这一问题在金融技术领域尤为明显，该领域创新迭代迅速、专利库持续增长，且引文推荐系统必须随着新申请的到来而更新。现有的专利检索与引文推荐方法通常依赖于静态索引或周期性重训练，这限制了其在动态环境中的有效运行能力。本研究提出一种专为大规模、快速变化的金融专利语料库设计的实时专利引文推荐框架。利用中国国家知识产权局（CNIPA）在2000年至2024年间授权的428,843项金融专利数据集，我们构建了一个三阶段推荐流程。该流程使用大语言模型（LLM）嵌入表示专利摘要的语义内容，应用高效的近似最近邻搜索构建可管理的候选集，并通过语义相似度对候选专利进行排序以生成top-k引文推荐。除提升推荐准确性外，所提框架直接应对专利系统的动态特性：通过采用基于可导航小世界分层（HNSW）图的增量索引策略，新授权专利可在无需重建整个索引的情况下被纳入。逐日滚动的更新实验表明，相较于基于重建的索引方法，增量更新在显著降低计算成本的同时提高了召回率。所提方法在各项实验中均稳定优于基于传统文本的基线模型及其他最近邻检索方法。