Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision. In this paper, we investigate the efficacy of LLMs in modeling passwords. We present PassGPT, a LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previously unseen passwords. Furthermore, we introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints, a feat lacking in current GAN-based strategies. Lastly, we conduct an in-depth analysis of the entropy and probability distribution that PassGPT defines over passwords and discuss their use in enhancing existing password strength estimators.
翻译:大语言模型(LLMs)能够从海量文本中成功建模自然语言,而无需显式监督。本文研究了LLMs在密码建模中的有效性。我们提出了PassGPT,这是一个基于密码泄露数据训练用于密码生成的大语言模型。PassGPT在猜测未见密码方面的性能优于基于生成对抗网络(GAN)的现有方法,其猜测数量达到后者的两倍。此外,我们引入了引导式密码生成的概念,利用PassGPT的采样过程生成符合任意约束条件的密码,而当前基于GAN的策略缺乏这一能力。最后,我们对PassGPT所定义的密码熵与概率分布进行了深入分析,并探讨了其在增强现有密码强度评估器中的应用。