PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain. More specifically, our model outperforms the next-best patent specific pre-trained language model (BERT for Patents) on our patent citation prediction test dataset on two different rank evaluation metrics. PaECTER predicts at least one most similar patent at a rank of 1.32 on average when compared against 25 irrelevant patents. Numerical representations generated by PaECTER from patent text can be used for downstream tasks such as classification, tracing knowledge flows, or semantic similarity search. Semantic similarity search is especially relevant in the context of prior art search for both inventors and patent examiners. PaECTER is available on Hugging Face.
翻译:PaECTER是一种公开可用的、专为专利设计的开源文档级编码器。我们利用审查员添加的引文信息对BERT for Patents进行微调,为专利文档生成数值表示。在专利领域的相似度任务中,PaECTER的性能优于当前最优模型。具体而言,在两个不同的排序评估指标上,我们的模型在我们的专利引文预测测试数据集上优于次优的专利专用预训练语言模型(BERT for Patents)。在与25个无关专利对比时,PaECTER平均以1.32的排名预测出至少一个最相似的专利。由PaECTER从专利文本生成的数值表示可用于下游任务,如分类、知识流动追踪或语义相似度搜索。语义相似度搜索在发明人和专利审查员的现有技术检索中尤为重要。PaECTER已在Hugging Face上开放获取。