Patent landscaping is the process of identifying all patents related to a particular technological area, and is important for assessing various aspects of the intellectual property context. Traditionally, constructing patent landscapes is intensely laborious and expensive, and the rapid expansion of patenting activity in recent decades has driven an increasing need for efficient and effective automated patent landscaping approaches. In particular, it is critical that we be able to construct patent landscapes using a minimal number of labeled examples, as labeling patents for a narrow technology area requires highly specialized (and hence expensive) technical knowledge. We present an automated neural patent landscaping system that demonstrates significantly improved performance on difficult examples (0.69 $F_1$ on 'hard' examples, versus 0.6 for previously reported systems), and also significant improvements with much less training data (overall 0.75 $F_1$ on as few as 24 examples). Furthermore, in evaluating such automated landscaping systems, acquiring good data is challenge; we demonstrate a higher-quality training data generation procedure by merging Abood and Feltenberger's (2018) "seed/anti-seed" approach with active learning to collect difficult labeled examples near the decision boundary. Using this procedure we created a new dataset of labeled AI patents for training and testing. As in prior work we compare our approach with a number of baseline systems, and we release our code and data for others to build upon.
翻译:专利布局分析是指识别与特定技术领域相关的所有专利的过程,这对于评估知识产权背景的各个方面至关重要。传统上,构建专利布局图极为耗时且成本高昂,近几十年来专利申请活动的迅速扩张,使得对高效自动化专利布局方法的需求日益增长。尤为关键的是,我们必须能够使用极少量的标注样本来构建专利布局图,因为为狭窄技术领域标注专利需要高度专业化(因而昂贵)的技术知识。我们提出了一种自动化神经专利布局系统,该系统在困难样本上表现出显著提升的性能(在“困难”样本上达到0.69的$F_1$值,而先前报道的系统为0.6),并且在使用极少训练数据的情况下也取得了显著改进(仅用24个样本即获得0.75的整体$F_1$值)。此外,在评估此类自动化布局系统时,获取优质数据是一大挑战;我们通过将Abood与Feltenberger(2018)的“种子/反种子”方法与主动学习相结合,提出了一种更高质量的训练数据生成流程,以收集决策边界附近难以标注的样本。利用此流程,我们创建了一个用于训练和测试的标注人工智能专利新数据集。与先前工作一致,我们将所提方法与多种基线系统进行比较,并公开了代码与数据以供后续研究。