We introduce the concept of "universal password model" -- a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system. To achieve this, the model does not need to access any plaintext passwords from the target credentials. Instead, it exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying password distribution. Specifically, the model uses deep learning to capture the correlation between the auxiliary data of a group of users (e.g., users of a web application) and their passwords. It then exploits those patterns to create a tailored password model for the target system at inference time. No further training steps, targeted data collection, or prior knowledge of the community's password distribution is required. Besides improving over current password strength estimation techniques and attacks, the model enables any end-user (e.g., system administrators) to autonomously generate tailored password models for their systems without the often unworkable requirements of collecting suitable training data and fitting the underlying machine learning model. Ultimately, our framework enables the democratization of well-calibrated password models to the community, addressing a major challenge in the deployment of password security solutions at scale.
翻译:我们提出"通用密码模型"的概念——这种密码模型经过预训练后,能根据目标系统自动调整其猜测策略。该模型无需访问目标凭证中的任何明文密码,而是利用用户的辅助信息(如电子邮件地址)作为代理信号来预测底层密码分布。具体而言,该模型通过深度学习捕获用户群体(例如某个网络应用的用户)辅助数据与对应密码之间的相关性,然后在推理阶段利用这些模式为特定目标系统创建定制化密码模型。整个过程无需额外训练步骤、定向数据收集或对用户群体密码分布的先验知识。该模型不仅改进了现有密码强度评估技术和攻击方法,还能让任何终端用户(如系统管理员)自主为其系统生成定制化密码模型,免除了收集适配训练数据和拟合机器学习模型这一通常难以实现的要求。最终,我们的框架实现了向社区普及经过良好校准的密码模型,解决了大规模部署密码安全解决方案中的关键挑战。