Much research has been done on user-generated textual passwords. Surprisingly, semantic information in such passwords remain underinvestigated, with passwords created by English- and/or Chinese-speaking users being more studied with limited semantics. This paper fills this gap by proposing a general framework based on semantically enhanced PCFG (probabilistic context-free grammars) named SE#PCFG. It allowed us to consider 43 types of semantic information, the richest set considered so far, for semantic password analysis. Applying SE#PCFG to 17 large leaked password databases of user speaking four languages (English, Chinese, German and French), we demonstrate its usefulness and report a wide range of new insights about password semantics at different levels such as cross-website password correlations. Furthermore, based on SE#PCFG and a new systematic smoothing method, we proposed the Semantically Enhanced Password Cracking Architecture (SEPCA). To compare the performance of SEPCA against three state-of-the-art (SOTA) benchmarks in terms of the password coverage rate: two other PCFG variants and FLA. Our experimental results showed that SEPCA outperformed all the three benchmarks consistently and significantly across 52 test cases, by up to 21.53%, 52.55% and 7.86%, respectively, at the user level (with duplicate passwords). At the level of unique passwords, SEPCA also beats the three benchmarks by up to 33.32%, 86.19% and 10.46%, respectively. The results demonstrated the power of SEPCA as a new password cracking framework.
翻译:关于用户生成文本密码的研究已有大量成果。令人惊讶的是,此类密码中的语义信息仍未得到充分研究,目前对英语和/或中文用户生成密码的语义研究较多但范围有限。本文提出一种基于语义增强型概率上下文无关文法(PCFG)的通用框架SE#PCFG,填补了这一空白。该框架可考虑43种语义信息(至今最丰富的语义集合)用于密码语义分析。将SE#PCFG应用于17个大型泄露密码数据库(涵盖英语、中文、德语和法语四种语言用户),我们验证了其有效性,并从跨网站密码关联等不同层面揭示了关于密码语义的一系列新发现。此外,基于SE#PCFG与一种新型系统平滑方法,我们提出了语义增强型密码破解架构(SEPCA)。为对比SEPCA与三种最新基准方法(另外两种PCFG变体及FLA)在密码覆盖率上的表现,实验结果表明:在52个测试案例中,SEPCA始终且显著优于所有三种基准方法——在用户级(含重复密码)条件下,性能分别提升高达21.53%、52.55%和7.86%;在唯一密码级条件下,性能分别提升高达33.32%、86.19%和10.46%。实验结果充分证明了SEPCA作为一种新型密码破解框架的强大能力。