Searchable symmetric encryption schemes often unintentionally disclose certain sensitive information, such as access, volume, and search patterns. Attackers can exploit such leakages and other available knowledge related to the user's database to recover queries. We find that the effectiveness of query recovery attacks depends on the volume/frequency distribution of keywords. Queries containing keywords with high volumes/frequencies are more susceptible to recovery, even when countermeasures are implemented. Attackers can also effectively leverage these ``special'' queries to recover all others. By exploiting the above finding, we propose a Jigsaw attack that begins by accurately identifying and recovering those distinctive queries. Leveraging the volume, frequency, and co-occurrence information, our attack achieves $90\%$ accuracy in three tested datasets, which is comparable to previous attacks (Oya et al., USENIX' 22 and Damie et al., USENIX' 21). With the same runtime, our attack demonstrates an advantage over the attack proposed by Oya et al (approximately $15\%$ more accuracy when the keyword universe size is 15k). Furthermore, our proposed attack outperforms existing attacks against widely studied countermeasures, achieving roughly $60\%$ and $85\%$ accuracy against the padding and the obfuscation, respectively. In this context, with a large keyword universe ($\geq$3k), it surpasses current state-of-the-art attacks by more than $20\%$.
翻译:可搜索对称加密方案常常无意中泄露某些敏感信息,例如访问模式、容量模式和搜索模式。攻击者可以利用此类泄露以及用户数据库相关的其他可用知识来恢复查询。我们发现,查询恢复攻击的有效性取决于关键词的容量/频率分布。即使采取了防御措施,包含高容量/高频率关键词的查询也更容易被恢复。攻击者还能够有效利用这些“特殊”查询来恢复所有其他查询。基于上述发现,我们提出了一种拼图攻击,该攻击首先准确识别并恢复那些独特的查询。通过利用容量、频率和共现信息,我们的攻击在三个测试数据集上达到了90%的准确率,这与先前攻击(Oya等人,USENIX'22和Damie等人,USENIX'21)相当。在相同运行时间下,我们的攻击相比Oya等人提出的攻击(当关键词空间大小为15k时,准确率高出约15%)展现出优势。此外,我们提出的攻击在应对广泛研究的防御措施时优于现有攻击,针对填充和混淆分别实现了约60%和85%的准确率。在此背景下,当关键词空间较大(≥3k)时,其准确率比当前最先进的攻击高出20%以上。