SALSA PICANTE: a machine learning attack on LWE with binary secrets

Learning with Errors (LWE) is a hard math problem underpinning many proposed post-quantum cryptographic (PQC) systems. The only PQC Key Exchange Mechanism (KEM) standardized by NIST is based on module~LWE, and current publicly available PQ Homomorphic Encryption (HE) libraries are based on ring LWE. The security of LWE-based PQ cryptosystems is critical, but certain implementation choices could weaken them. One such choice is sparse binary secrets, desirable for PQ HE schemes for efficiency reasons. Prior work, SALSA, demonstrated a machine learning-based attack on LWE with sparse binary secrets in small dimensions ($n \le 128$) and low Hamming weights ($h \le 4$). However, this attack assumes access to millions of eavesdropped LWE samples and fails at higher Hamming weights or dimensions. We present PICANTE, an enhanced machine learning attack on LWE with sparse binary secrets, which recovers secrets in much larger dimensions (up to $n=350$) and with larger Hamming weights (roughly $n/10$, and up to $h=60$ for $n=350$). We achieve this dramatic improvement via a novel preprocessing step, which allows us to generate training data from a linear number of eavesdropped LWE samples ($4n$) and changes the distribution of the data to improve transformer training. We also improve the secret recovery methods of SALSA and introduce a novel cross-attention recovery mechanism allowing us to read off the secret directly from the trained models. While PICANTE does not threaten NIST's proposed LWE standards, it demonstrates significant improvement over SALSA and could scale further, highlighting the need for future investigation into machine learning attacks on LWE with sparse binary secrets.

翻译：带错误学习（Learning with Errors, LWE）是一种困难数学问题，支撑着众多后量子密码（PQC）系统的设计。目前NIST标准化的唯一PQC密钥交换机制（KEM）基于模块LWE（module LWE），而公开可用的PQ同态加密（HE）库则基于环LWE（ring LWE）。基于LWE的PQ密码系统的安全性至关重要，但某些实现选择可能削弱其安全性。其中一种选择是稀疏二进制秘密，这出于效率原因在PQ HE方案中备受青睐。先前工作SALSA展示了针对小维度（$n \le 128$）和低汉明权重（$h \le 4$）下的稀疏二进制秘密LWE问题的机器学习攻击。然而，该攻击需获取数百万个窃听的LWE样本，且无法应对更高汉明权重或更大维度。我们提出PICANTE——一种针对稀疏二进制秘密LWE问题的增强型机器学习攻击，能够在更大维度（最高$n=350$）和更高汉明权重（约$n/10$，对于$n=350$可达$h=60$）下恢复秘密。这一显著提升得益于一种新颖的预处理步骤：该步骤使我们能够从线性数量的窃听LWE样本（$4n$）中生成训练数据，并通过改变数据分布优化Transformer训练。我们还改进了SALSA的秘密恢复方法，引入了一种新型交叉注意力恢复机制，可直接从训练模型中读取秘密。尽管PICANTE尚未威胁NIST提出的LWE标准，但它相较于SALSA展现了显著改进且具备进一步扩展的能力，凸显了未来针对稀疏二进制秘密LWE问题的机器学习攻击研究的必要性。