IT systems are facing an increasing number of security threats, including advanced persistent attacks and future quantum-computing vulnerabilities. The move towards crypto-agility and post-quantum cryptography (PQC) requires a reliable inventory of cryptographic assets across heterogeneous IT environments. Due to the sheer amount of packets, it is infeasible to manually detect cryptographically relevant software. Further, static code analysis pipelines often fail to address the diversity of modern ecosystems. Our research explores the use of large language models (LLMs) as heuristic tools for cryptographic asset discovery. We propose a collaborative framework that employs multiple LLMs to assess software relevance and aggregates their outputs through majority voting. To preserve data privacy, the approach operates on-premises without reliance on external servers. Using over 65,000 Fedora Linux packages, we evaluate the reliability of this method through statistical analysis, inter-model agreement, and manual validation. Preliminary results suggest that~LLM ensembles can serve as an efficient first-pass filter for identifying cryptographic software, resulting in reduced manual workload and assisting PQC transition. The study also compares on-premises and online LLM configurations, highlighting key advantages, limitations, and future directions for automated cryptographic asset discovery.
翻译:IT系统正面临日益增多的安全威胁,包括高级持续性攻击和未来量子计算漏洞。向密码敏捷性和后量子密码学(PQC)的转型,需要在异构IT环境中建立可靠的密码资产清单。由于软件包数量庞大,人工检测密码相关软件并不可行。此外,静态代码分析流程往往难以应对现代生态系统的多样性。本研究探索将大语言模型(LLMs)作为密码资产发现的启发式工具。我们提出一种协作框架,采用多个LLM评估软件相关性,并通过多数投票机制聚合其输出。为保护数据隐私,该方法在本地运行,无需依赖外部服务器。基于超过65,000个Fedora Linux软件包,我们通过统计分析、模型间一致性检验和人工验证评估了该方法的可靠性。初步结果表明,LLM集成模型可作为识别密码软件的高效初筛过滤器,从而减少人工工作量并辅助PQC迁移。本研究还比较了本地与在线LLM配置,阐明了自动化密码资产发现的关键优势、局限性与未来研究方向。