We introduce the concept of "universal password model" -- a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system. To achieve this, the model does not need to access any plaintext passwords from the target credentials. Instead, it exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying password distribution. Specifically, the model uses deep learning to capture the correlation between the auxiliary data of a group of users (e.g., users of a web application) and their passwords. It then exploits those patterns to create a tailored password model for the target system at inference time. No further training steps, targeted data collection, or prior knowledge of the community's password distribution is required. Besides improving over current password strength estimation techniques and attacks, the model enables any end-user (e.g., system administrators) to autonomously generate tailored password models for their systems without the often unworkable requirements of collecting suitable training data and fitting the underlying machine learning model. Ultimately, our framework enables the democratization of well-calibrated password models to the community, addressing a major challenge in the deployment of password security solutions at scale.
翻译:我们提出"通用密码模型"的概念——这类模型经过预训练后,能够根据目标系统自动调整其猜测策略。为实现这一目标,模型无需访问目标凭证中的任何明文密码,而是利用用户的辅助信息(如电子邮件地址)作为代理信号来预测底层密码分布。具体而言,该模型通过深度学习捕获用户群体辅助数据(例如网络应用程序用户)与其密码之间的关联性,并在推理阶段利用这些模式为目标系统生成定制化密码模型。该过程无需额外训练步骤、针对性数据收集,也无需预知用户社区的密码分布规律。相较于现有密码强度评估技术与攻击方法,该模型不仅能实现性能提升,更使任意终端用户(如系统管理员)能够自主为其系统生成定制密码模型,从而规避了收集适配训练数据及拟合底层机器学习模型等通常难以实现的要求。最终,本框架实现了社区级良好校准密码模型的民主化部署,为解决大规模密码安全方案部署中的关键挑战提供了可行路径。