Acoustic side-channel attacks (ASCA) on keyboards pose a significant security risk, as keystrokes can be inferred from typing acoustics, revealing sensitive information. Prior ASCA studies are limited by small-scale datasets with restricted diversity in users, keyboards, and environments, constraining analysis across devices, microphones, and noise conditions. We introduce HEAR, a dataset designed to study ASCA along three axes: keyboard generalization, noise adaptation, and user bias. HEAR contains recordings from 53 participants using 37 laptop keyboards, collected in three realistic settings: (1) external microphone capture, (2) device microphone capture without network noise, and (3) VoIP-based streaming capture. This enables controlled evaluation across users, keyboards, and environments. On HEAR, we establish an ASCA benchmark spanning conventional features and pre-trained representations from raw audio and spectrograms in unimodal and multimodal settings. We propose DECKER, a domain-invariant keystroke inference framework with four stages: (1) Keyboard Signature Normalization to reduce device coloration, (2) domain-adversarial disentanglement to suppress keyboard identity, (3) supervised cross-keyboard contrastive alignment to enforce key consistency, and (4) Acoustic Style Randomization to synthesize unseen keyboard responses. We further explore sentence-level inference using an LLM-based post-processing layer to refine keystroke sequences via linguistic context. Results on HEAR show DECKER improves keystroke identification over strong baselines, particularly in cross-keyboard and cross-user settings, with further gains from language-model rectification. These findings highlight that ASCA remains effective across diverse users, devices, and noisy environments, underscoring its practical security risk.
翻译:摘要:键盘声学侧信道攻击(ASCA)构成重大安全风险,因为击键动作可从打字声学特征中被推断,从而泄露敏感信息。以往的ASCA研究受限于小规模数据集,在用户、键盘和环境多样性方面存在局限,制约了跨设备、麦克风和噪声条件的分析能力。我们提出HEAR数据集,旨在沿三个维度研究ASCA:键盘泛化性、噪声适应性和用户偏差。HEAR包含53名参与者使用37种笔记本电脑键盘的录音,在三种真实场景下收集:(1)外部麦克风捕获,(2)无网络噪声的设备麦克风捕获,以及(3)基于VoIP的流媒体捕获。这实现了跨用户、键盘和环境受控评估。在HEAR上,我们建立了涵盖传统特征及从原始音频和频谱图预训练表示(包括单模态与多模态设置)的ASCA基准。我们提出DECKER——一种四阶段域不变击键推断框架:(1)键盘签名归一化以减少设备染色效应,(2)域对抗解耦以抑制键盘身份特征,(3)监督式跨键盘对比对齐以强化按键一致性,以及(4)声学风格随机化以合成未见过的键盘响应。我们进一步探索基于大语言模型(LLM)的后处理层实现语句级推断,通过语言上下文优化击键序列。HEAR上的结果表明,DECKER在跨键盘和跨用户设置中显著优于强基线方法的击键识别性能,且语言模型校正带来额外增益。这些发现凸显ASCA在多样化用户、设备和噪声环境中仍具有效性,从而强调其实际安全风险。