Early screening of potential breakthrough technologies with enhanced interpretability: A patent-specific hierarchical attention network model

Despite the usefulness of machine learning approaches for the early screening of potential breakthrough technologies, their practicality is often hindered by opaque models. To address this, we propose an interpretable machine learning approach to predicting future citation counts from patent texts using a patent-specific hierarchical attention network (PatentHAN) model. Central to this approach are (1) a patent-specific pre-trained language model, capturing the meanings of technical words in patent claims, (2) a hierarchical network structure, enabling detailed analysis at the claim level, and (3) a claim-wise self-attention mechanism, revealing pivotal claims during the screening process. A case study of 35,376 pharmaceutical patents demonstrates the effectiveness of our approach in early screening of potential breakthrough technologies while ensuring interpretability. Furthermore, we conduct additional analyses using different language models and claim types to examine the robustness of the approach. It is expected that the proposed approach will enhance expert-machine collaboration in identifying breakthrough technologies, providing new insight derived from text mining into technological value.

翻译：尽管机器学习方法在潜在突破性技术的早期筛查中具有实用性，但其应用常因模型不透明而受限。为此，我们提出一种可解释的机器学习方法，通过专利专用的分层注意力网络（PatentHAN）模型，依据专利文本预测未来引用次数。该方法的核心包括：（1）专利专用的预训练语言模型，用于捕捉专利权利要求中技术词汇的含义；（2）分层网络结构，支持在权利要求级别进行详细分析；（3）权利要求级别的自注意力机制，揭示筛查过程中的关键权利要求。通过对35,376项药物专利的案例研究，证明了该方法在确保可解释性的同时，能有效实现潜在突破性技术的早期筛查。此外，我们使用不同的语言模型和权利要求类型进行了补充分析，以检验该方法的稳健性。预期所提出的方法将加强专家与机器在识别突破性技术方面的协作，为从文本挖掘中获取技术价值提供新的见解。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《用于无线通信和传感的智能反射面 (IRS)》（ICC 2022）新加坡国立大学2022最新53页slides

专知会员服务

26+阅读 · 2022年11月16日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日