This study investigates a primary inaudible attack vector on Amazon Alexa voice services using near ultrasound trojans and focuses on characterizing the attack surface and examining the practical implications of issuing inaudible voice commands. The research maps each attack vector to a tactic or technique from the MITRE ATT&CK matrix, covering enterprise, mobile, and Industrial Control System (ICS) frameworks. The experiment involved generating and surveying fifty near-ultrasonic audios to assess the attacks' effectiveness, with unprocessed commands having a 100% success rate and processed ones achieving a 58% overall success rate. This systematic approach stimulates previously unaddressed attack surfaces, ensuring comprehensive detection and attack design while pairing each ATT&CK Identifier with a tested defensive method, providing attack and defense tactics for prompt-response options. The main findings reveal that the attack method employs Single Upper Sideband Amplitude Modulation (SUSBAM) to generate near-ultrasonic audio from audible sources, transforming spoken commands into a frequency range beyond human-adult hearing. By eliminating the lower sideband, the design achieves a 6 kHz minimum from 16-22 kHz while remaining inaudible after transformation. The research investigates the one-to-many attack surface where a single device simultaneously triggers multiple actions or devices. Additionally, the study demonstrates the reversibility or demodulation of the inaudible signal, suggesting potential alerting methods and the possibility of embedding secret messages like audio steganography.
翻译:本研究探讨了一种利用近超声波木马对亚马逊Alexa语音服务实施的主要不可听攻击向量,重点刻画攻击面特征并检验发出不可听语音指令的实际影响。研究将每个攻击向量映射至MITRE ATT&CK矩阵中的战术或技术,涵盖企业、移动和工业控制系统(ICS)框架。实验通过生成并检测五十个近超声波音频来评估攻击有效性,其中未处理指令成功率达100%,处理后的指令整体成功率为58%。这种系统化方法激发了此前未涉及的攻击面,确保检测与攻击设计的全面性,同时为每个ATT&CK标识符配对已验证的防御方法,提供攻防战术以支持即时响应方案。主要发现表明,攻击方法采用单边带上调幅(SUSBAM)从可听声源生成近超声波音频,将语音指令转换为超出成人听力范围的频段。通过消除下边带,设计实现了16-22 kHz频段中最低6 kHz的频率范围,且在转换后保持不可听性。研究探讨了“一对多”攻击面,即单个设备可同时触发多个动作或设备。此外,研究展示了不可听信号的可逆性或解调能力,暗示了潜在警报方法及嵌入秘密信息(如音频隐写术)的可能性。