The migration to post-quantum cryptography is urgent for Internet of Things devices with 10--20 year lifespans, yet no systematic benchmarks exist for the finalised NIST standards on the most constrained 32-bit processor class. This paper presents the first isolated algorithm-level benchmarks of ML-KEM (FIPS 203) and ML-DSA (FIPS 204) on ARM Cortex-M0+, measured on the RP2040 (Raspberry Pi Pico) at 133 MHz with 264 KB SRAM. Using PQClean reference C implementations, we measure all three security levels of ML-KEM (512/768/1024) and ML-DSA (44/65/87) across key generation, encapsulation/signing, and decapsulation/verification. ML-KEM-512 completes a full key exchange in 35.7 ms with an estimated energy cost of 2.83 mJ (datasheet power model)--17x faster than a complete ECDH P-256 key agreement on the same hardware. ML-DSA signing exhibits high latency variance due to rejection sampling (coefficient of variation 66--73%, 99th-percentile up to 1,125 ms for ML-DSA-87). The M0+ incurs only a 1.8--1.9x slowdown relative to published Cortex-M4 reference C results (compiled with -O3 versus our -Os), despite lacking 64-bit multiply, DSP, and SIMD instructions--making this a conservative upper bound on the microarchitectural penalty. All code, data, and scripts are released as an open-source benchmark suite for reproducibility.
翻译:为了确保具有10-20年使用寿命的物联网设备能够安全过渡到后量子密码学时代,相关迁移工作刻不容缓。然而,针对资源最为受限的32位处理器类别,目前尚缺乏基于最终确定的NIST标准的系统性基准测试。本文首次在ARM Cortex-M0+上,基于RP2040(树莓派Pico)平台(运行于133 MHz,配备264 KB SRAM),提供了针对ML-KEM(FIPS 203)和ML-DSA(FIPS 204)的独立算法级基准测试。我们采用PQClean参考C语言实现,测量了ML-KEM(512/768/1024)和ML-DSA(44/65/87)所有三个安全级别在密钥生成、封装/签名以及解密/验证等环节的性能。ML-KEM-512完成一次完整密钥交换需35.7毫秒,估算能耗为2.83毫焦(基于数据手册功耗模型),这比在同一硬件上进行一次完整的ECDH P-256密钥协商快17倍。ML-DSA签名因拒绝采样机制而呈现出高延迟方差(变异系数为66-73%,对于ML-DSA-87,其99百分位延迟高达1125毫秒)。与已发布的Cortex-M4参考C语言结果(使用-O3编译,而我们的代码使用-Os优化)相比,M0+处理器的性能下降仅为1.8-1.9倍,尽管其缺乏64位乘法器、DSP以及SIMD指令——这构成了微架构性能惩罚的一个保守上界。为便于复现,所有代码、数据和脚本均已作为开源基准测试套件发布。