The citation network of patents citing prior art arises from the legal obligation of patent applicants to properly disclose their invention. One way to study the relationship between current patents and their antecedents is by analyzing the similarity between the textual elements of patents. Many patent similarity indicators have shown a constant decrease since the mid-70s. Although several explanations have been proposed, more comprehensive analyses of this phenomenon have been rare. In this paper, we use a computationally efficient measure of patent similarity scores that leverages state-of-the-art Natural Language Processing tools, to investigate potential drivers of this apparent similarity decrease. This is achieved by modeling patent similarity scores by means of generalized additive models. We found that non-linear modeling specifications are able to distinguish between distinct, temporally varying drivers of the patent similarity levels that explain more variation in the data ($R^2\sim 18\%$) compared to previous methods. Moreover, the model reveals an underlying trend in similarity scores that is fundamentally different from the one presented previously.
翻译:专利引用先前技术的引用网络源于专利申请人依法充分披露其发明的义务。研究当前专利与其前身之间关系的一种方法是分析专利文本元素的相似性。自20世纪70年代中期以来,许多专利相似性指标持续下降。尽管已有多种解释被提出,但对此现象进行更全面分析的研究仍较为罕见。本文采用一种计算高效的专利相似度度量方法,该方法利用了最先进的自然语言处理工具,以探究这一明显相似度下降的潜在驱动因素。我们通过广义加性模型对专利相似度进行建模,发现非线性建模规范能够区分导致专利相似度水平变化的多个随时间变化的驱动因素,相比先前方法能解释更多数据变异($R^2\sim 18\%$)。此外,该模型揭示的相似度潜在趋势与先前研究呈现的结论存在根本性差异。