性别化词汇与专利授权率：专利系统中差异结果的文本分析 (Gendered Words and Grant Rates: A Textual Analysis of Disparate Outcomes in the Patent System)

Text is a vehicle to convey information that reflects the writer's linguistic style and communicative patterns. By studying these attributes, we can discover latent insights about the author and their underlying message. This article uses such an approach to better understand patent applications and their inventors. While prior research focuses on patent metadata, we employ machine learning and natural language processing to extract hidden information from the words in patent applications. Through these methods, we find that inventor gender can often be identified from textual attributes - even without knowing the inventor's name. This ability to discern gender through text suggests that anonymized patent examination - often proposed as a solution to mitigate disparities in patent grant rates - may not fully address gendered outcomes in securing a patent. Our study also investigates whether objective features of a patent application can predict if it will be granted. Using a classifier algorithm, we correctly predicted whether a patent was granted over 60% of the time. Further analysis emphasized that writing style - like vocabulary and sentence complexity - disproportionately influenced grant predictions relative to other attributes such as inventor gender and subject matter keywords. Lastly, we examine whether women disproportionately invent in technological areas with higher rejection rates. Using a clustering algorithm, applications were allocated into groups with related subject matter. We found that 85% of female-dominated clusters have abnormally high rejection rates, compared to only 45% for male-dominated groupings. These findings highlight complex interactions between textual choices, gender, and success in securing a patent. They also raise questions about whether current proposals will be sufficient to achieve gender equity and efficiency in the patent system.

翻译：文本是传递信息的载体，它反映了作者的写作风格与沟通模式。通过研究这些特征，我们可以发现关于作者及其潜在信息的隐藏洞见。本文采用这种方法来更好地理解专利申请及其发明人。先前的研究主要关注专利元数据，而我们运用机器学习与自然语言处理技术，从专利申请的文本中提取隐藏信息。通过这些方法，我们发现发明人的性别往往可以通过文本特征被识别——即使在不了解发明人姓名的情况下。这种通过文本辨别性别的能力表明，匿名化专利审查——常被提议作为缓解专利授权率差异的解决方案——可能无法完全解决获取专利过程中的性别化结果。本研究还探讨了专利申请的客观特征是否能预测其是否会被授权。使用分类器算法，我们正确预测专利是否被授权的准确率超过60%。进一步分析强调，相对于发明人性别和主题关键词等其他属性，写作风格——如词汇选择和句子复杂度——对授权预测的影响尤为显著。最后，我们研究了女性是否在拒绝率较高的技术领域中进行不成比例的发明活动。通过聚类算法，我们将专利申请分配到具有相关主题的组别中。研究发现，85%的女性主导集群具有异常高的拒绝率，而男性主导集群中这一比例仅为45%。这些发现揭示了文本选择、性别与获取专利成功之间的复杂相互作用。它们也引发了对当前改革方案是否足以实现专利系统中性别平等与效率的质疑。