We present AnonShield, a high-throughput, on-premise pseudonymization system that combines GPU-accelerated NER, streaming processing, caching, and schema-aware configuration. Evaluated on datasets up to 550 MB (70,951 records), AnonShield reduces processing time from over 92 hours to under 10 minutes (up to 738x speedup) while achieving up to 94.2% F1-score and 96.7% recall. Our results show that scalable pseudonymization of vulnerability data is feasible without sacrificing analytical utility, enabling compliant data sharing in operational CSIRT environments.
翻译:我们提出AnonShield这一高吞吐量、本地部署的伪匿名化系统,该系统融合了GPU加速的命名实体识别、流式处理、缓存机制与模式感知配置。在高达550 MB(70951条记录)的数据集上评估表明,AnonShield将处理时间从超过92小时缩短至10分钟以内(最高加速738倍),同时实现了94.2%的F1分数与96.7%的召回率。研究结果表明,在不牺牲分析效用的前提下,对漏洞数据进行可扩展的伪匿名化处理是可行的,从而支持运营级CSIRT环境中的合规数据共享。