The migration to post-quantum cryptography is urgent for Internet of Things devices with 10-20 year lifespans, yet no systematic benchmarks exist for the finalised NIST standards on the most constrained 32-bit processor class. This paper presents the first isolated algorithm-level benchmarks of ML-KEM (FIPS 203) and ML-DSA (FIPS 204) on ARM Cortex-M0+, measured on the RP2040 (Raspberry Pi Pico) at 133 MHz with 264 KB SRAM. Using PQClean reference C implementations, we measure all three security levels of ML-KEM (512/768/1024) and ML-DSA (44/65/87) across key generation, encapsulation/signing, and decapsulation/verification. ML-KEM-512 completes a full key exchange in 36.3 ms consuming 2.87 mJ--17x faster and 94% less energy than ECDH P-256 on the same hardware. ML-DSA signing exhibits high latency variance due to rejection sampling (coefficient of variation 61-71%, 99th-percentile up to 1,115 ms for ML-DSA-87). The M0+ incurs only a 1.8-1.9x slowdown relative to published Cortex-M4 results, despite lacking 64-bit multiply, DSP, and SIMD instructions. All code, data, and scripts are released as an open-source benchmark suite for reproducibility.
翻译:后量子密码向物联网设备的迁移迫在眉睫——尤其是面向生命周期长达10-20年的设备,然而,目前尚不存在针对NIST最终确定标准在约束最强的32位处理器类别上的系统性基准测试。本文首次在ARM Cortex-M0+上对ML-KEM(FIPS 203)和ML-DSA(FIPS 204)进行了独立算法级基准测试,测量平台为RP2040(Raspberry Pi Pico),主频133 MHz,SRAM容量264 KB。利用PQClean参考C语言实现,我们测量了ML-KEM(512/768/1024)和ML-DSA(44/65/87)全部三个安全级别在密钥生成、封装/签名和解封装/验证各阶段的性能。ML-KEM-512完成一次完整密钥交换耗时36.3毫秒,消耗2.87毫焦耳——相比同硬件上的ECDH P-256,速度提升17倍,能耗降低94%。由于重采样机制,ML-DSA签名过程呈现高延迟方差(变异系数为61-71%,ML-DSA-87的99百分位延迟高达1115毫秒)。尽管缺少64位乘法、DSP和SIMD指令,M0+相较于已发表的Cortex-M4结果仅存在1.8-1.9倍的性能下降。为保障可复现性,所有代码、数据和脚本均以开源基准测试套件形式发布。