Patent examination remains an ongoing challenge in the NLP literature even after the advent of large language models (LLMs), as it requires an extensive yet nuanced human judgment on whether a submitted claim meets the statutory standards of novelty and non-obviousness against previously granted claims -- prior art -- in expert domains. Previous NLP studies have approached this challenge as a prediction task (e.g., forecasting grant outcomes) with high-level proxies such as similarity metrics or classifiers trained on historical labels. However, this approach often overlooks the step-by-step evaluations that examiners must make with profound information, including rationales for the decisions provided in office actions documents, which also makes it harder to measure the current state of techniques in patent review processes. To fill this gap, we construct PANORAMA, a dataset of 8,143 U.S. patent examination records that preserves the full decision trails, including original applications, all cited references, Non-Final Rejections, and Notices of Allowance. Also, PANORAMA decomposes the trails into sequential benchmarks that emulate patent professionals' patent review processes and allow researchers to examine large language models' capabilities at each step of them. Our findings indicate that, although LLMs are relatively effective at retrieving relevant prior art and pinpointing the pertinent paragraphs, they struggle to assess the novelty and non-obviousness of patent claims. We discuss these results and argue that advancing NLP, including LLMs, in the patent domain requires a deeper understanding of real-world patent examination. Our dataset is openly available at https://huggingface.co/datasets/LG-AI-Research/PANORAMA.
翻译:即使在大型语言模型(LLMs)出现之后,专利审查仍然是自然语言处理(NLP)文献中的一个持续挑战,因为它需要基于先前已授权权利要求——即现有技术——在专业领域内,对提交的权利要求是否符合新颖性和非显而易见性的法定标准进行广泛而细致的人工判断。以往的NLP研究将这一挑战视为预测任务(例如,预测授权结果),并采用相似性度量或基于历史标签训练的分类器等高级代理方法。然而,这种方法常常忽略了审查员必须基于丰富信息进行的逐步评估,包括在审查意见通知书中提供的决策理由,这也使得衡量当前技术在专利审查流程中的水平变得更加困难。为填补这一空白,我们构建了PANORAMA数据集,包含8,143条美国专利审查记录,完整保留了决策轨迹,包括原始申请、所有引用的参考文献、非最终驳回意见和授权通知书。此外,PANORAMA将轨迹分解为模拟专利专业人员审查流程的顺序基准,使研究人员能够评估大型语言模型在每个步骤中的能力。我们的研究结果表明,尽管LLMs在检索相关现有技术和定位相关段落方面相对有效,但在评估专利权利要求的新颖性和非显而易见性方面仍存在困难。我们讨论了这些结果,并认为要在专利领域推进NLP(包括LLMs)的发展,需要对现实世界的专利审查有更深入的理解。我们的数据集已在https://huggingface.co/datasets/LG-AI-Research/PANORAMA公开提供。