In order to solve the problem of insufficient generation quality caused by traditional patent text abstract generation models only originating from patent specifications, the problem of new terminology OOV caused by rapid patent updates, and the problem of information redundancy caused by insufficient consideration of the high professionalism, accuracy, and uniqueness of patent texts, we proposes a patent text abstract generation model (MSEA) based on a master-slave encoder architecture; Firstly, the MSEA model designs a master-slave encoder, which combines the instructions in the patent text with the claims as input, and fully explores the characteristics and details between the two through the master-slave encoder; Then, the model enhances the consideration of new technical terms in the input sequence based on the pointer network, and further enhances the correlation with the input text by re weighing the "remembered" and "for-gotten" parts of the input sequence from the encoder; Finally, an enhanced repetition suppression mechanism for patent text was introduced to ensure accurate and non redundant abstracts generated. On a publicly available patent text dataset, compared to the state-of-the-art model, Improved Multi-Head Attention Mechanism (IMHAM), the MSEA model achieves an improvement of 0.006, 0.005, and 0.005 in Rouge-1, Rouge-2, and Rouge-L scores, respectively. MSEA leverages the characteristics of patent texts to effectively enhance the quality of patent text generation, demonstrating its advancement and effectiveness in the experiments.
翻译:为解决传统专利文本摘要生成模型仅源于专利说明书导致的生成质量不足、专利快速更新导致的新术语OOV问题,以及因对专利文本高度专业性、准确性和独特性考量不足造成的信息冗余问题,本文提出一种基于主从编码器架构的专利文本摘要生成模型(MSEA)。首先,MSEA模型设计了主从编码器,将专利文本中的说明书与权利要求书作为输入,通过主从编码器充分挖掘两者间的特征与细节;其次,模型基于指针网络增强对输入序列中新术语的考量,并通过重加权编码器输入序列中“记忆”与“遗忘”部分,进一步强化与输入文本的关联性;最后,引入针对专利文本的增强型重复抑制机制,确保生成摘要的准确性与非冗余性。在公开专利文本数据集上的实验表明,相较于当前最优模型改进多头注意力机制(IMHAM),MSEA模型在Rouge-1、Rouge-2和Rouge-L分数上分别提升了0.006、0.005和0.005。MSEA通过利用专利文本特性有效提升了专利文本生成质量,实验验证了其先进性与有效性。