LLM-based Extraction of Contradictions from Patents

Already since the 1950s TRIZ shows that patents and the technical contradictions they solve are an important source of inspiration for the development of innovative products. However, TRIZ is a heuristic based on a historic patent analysis and does not make use of the ever-increasing number of latest technological solutions in current patents. Because of the huge number of patents, their length, and, last but not least, their complexity there is a need for modern patent retrieval and patent analysis to go beyond keyword-oriented methods. Recent advances in patent retrieval and analysis mainly focus on dense vectors based on neural AI Transformer language models like Google BERT. They are, for example, used for dense retrieval, question answering or summarization and key concept extraction. A research focus within the methods for patent summarization and key concept extraction are generic inventive concepts respectively TRIZ concepts like problems, solutions, advantage of invention, parameters, and contradictions. Succeeding rule-based approaches, finetuned BERT-like language models for sentence-wise classification represent the state-of-the-art of inventive concept extraction. While they work comparatively well for basic concepts like problems or solutions, contradictions - as a more complex abstraction - remain a challenge for these models. This paper goes one step further, as it presents a method to extract TRIZ contradictions from patent texts based on Prompt Engineering using a generative Large Language Model (LLM), namely OpenAI's GPT-4. Contradiction detection, sentence extraction, contradiction summarization, parameter extraction and assignment to the 39 abstract TRIZ engineering parameters are all performed in a single prompt using the LangChain framework. Our results show that "off-the-shelf" GPT-4 is a serious alternative to existing approaches.

翻译：自20世纪50年代以来，TRIZ理论已表明，专利及其所解决的技术矛盾是创新产品开发的重要灵感来源。然而，TRIZ是基于历史专利分析的启发式方法，并未充分利用当前专利中不断涌现的最新解决方案。由于专利数量庞大、篇幅冗长，且复杂程度高，现代专利检索与分析亟需超越传统关键词导向的方法。近期专利检索与分析领域的进展主要集中于基于神经AI Transformer语言模型（如Google BERT）的稠密向量技术，这些技术被应用于稠密检索、问答、摘要生成及关键概念提取等任务。在专利摘要与关键概念提取方法中，研究重点之一为通用发明概念（即TRIZ概念），例如问题、解决方案、发明优势、参数及矛盾。继基于规则的方法后，针对句子级分类的微调类BERT语言模型已成为发明概念提取的当前最佳技术。尽管这些模型在“问题”或“解决方案”等基础概念提取方面表现较优，但对于更具抽象复杂性的“矛盾”概念仍存在挑战。本文进一步提出一种基于提示工程的方法，利用生成式大语言模型（LLM）——即OpenAI的GPT-4——从专利文本中提取TRIZ矛盾。通过LangChain框架，矛盾检测、句子提取、矛盾摘要、参数提取及与39个抽象TRIZ工程参数的匹配等步骤均在单个提示中完成。结果表明，“开箱即用”的GPT-4可成为现有方法的可靠替代方案。