DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition

Acoustic side-channel attacks (ASCA) on keyboards pose a significant security risk, as keystrokes can be inferred from typing acoustics, revealing sensitive information. Prior ASCA studies are limited by small-scale datasets with restricted diversity in users, keyboards, and environments, constraining analysis across devices, microphones, and noise conditions. We introduce HEAR, a dataset designed to study ASCA along three axes: keyboard generalization, noise adaptation, and user bias. HEAR contains recordings from 53 participants using 37 laptop keyboards, collected in three realistic settings: (1) external microphone capture, (2) device microphone capture without network noise, and (3) VoIP-based streaming capture. This enables controlled evaluation across users, keyboards, and environments. On HEAR, we establish an ASCA benchmark spanning conventional features and pre-trained representations from raw audio and spectrograms in unimodal and multimodal settings. We propose DECKER, a domain-invariant keystroke inference framework with four stages: (1) Keyboard Signature Normalization to reduce device coloration, (2) domain-adversarial disentanglement to suppress keyboard identity, (3) supervised cross-keyboard contrastive alignment to enforce key consistency, and (4) Acoustic Style Randomization to synthesize unseen keyboard responses. We further explore sentence-level inference using an LLM-based post-processing layer to refine keystroke sequences via linguistic context. Results on HEAR show DECKER improves keystroke identification over strong baselines, particularly in cross-keyboard and cross-user settings, with further gains from language-model rectification. These findings highlight that ASCA remains effective across diverse users, devices, and noisy environments, underscoring its practical security risk.

翻译：摘要：键盘声学侧信道攻击（ASCA）构成重大安全风险，因为击键动作可从打字声学特征中被推断，从而泄露敏感信息。以往的ASCA研究受限于小规模数据集，在用户、键盘和环境多样性方面存在局限，制约了跨设备、麦克风和噪声条件的分析能力。我们提出HEAR数据集，旨在沿三个维度研究ASCA：键盘泛化性、噪声适应性和用户偏差。HEAR包含53名参与者使用37种笔记本电脑键盘的录音，在三种真实场景下收集：（1）外部麦克风捕获，（2）无网络噪声的设备麦克风捕获，以及（3）基于VoIP的流媒体捕获。这实现了跨用户、键盘和环境受控评估。在HEAR上，我们建立了涵盖传统特征及从原始音频和频谱图预训练表示（包括单模态与多模态设置）的ASCA基准。我们提出DECKER——一种四阶段域不变击键推断框架：（1）键盘签名归一化以减少设备染色效应，（2）域对抗解耦以抑制键盘身份特征，（3）监督式跨键盘对比对齐以强化按键一致性，以及（4）声学风格随机化以合成未见过的键盘响应。我们进一步探索基于大语言模型（LLM）的后处理层实现语句级推断，通过语言上下文优化击键序列。HEAR上的结果表明，DECKER在跨键盘和跨用户设置中显著优于强基线方法的击键识别性能，且语言模型校正带来额外增益。这些发现凸显ASCA在多样化用户、设备和噪声环境中仍具有效性，从而强调其实际安全风险。