The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. It has been demonstrated that standard bibliometrics approaches show their limits in such a fast-evolving domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs in terms of entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context, but our results show some potential for noun extractors. For this reason, we developed a noun extractor boosted with some statistical analysis to extract specific and relevant compound nouns from the domain. Later, we tested our model to identify trends in the LLM domain. We observe some limitations, but it offers promising results to monitor the evolution of emergent trends.
翻译:网络空间安全领域发展迅速,给各类组织带来严峻威胁。为增强抵御能力,必须持续追踪该领域的最新进展与趋势。研究表明,在这样一个快速演变的领域中,传统的文献计量方法已显露出局限性。为此,我们采用大型语言模型从网络空间安全相关文本中提取关键知识实体。以arXiv预印本中网络空间安全子集为数据源,从实体识别与相关性两个维度对多种LLM进行对比分析。结果表明,LLM未能有效生成反映网络空间安全语境的知识实体,但名词提取器展现出一定潜力。基于此,我们开发了结合统计分析增强的名词提取器,专门用于提取领域内特定且相关的复合名词。随后,我们将该模型应用于识别LLM领域的发展趋势。尽管存在若干局限,但该模型在监测新兴趋势演变方面展现了具有前景的成效。