PassViz: A Visualisation System for Analysing Leaked Passwords

Passwords remain the most widely used form of user authentication, despite advancements in other methods. However, their limitations, such as susceptibility to attacks, especially weak passwords defined by human users, are well-documented. The existence of weak human-defined passwords has led to repeated password leaks from websites, many of which are of large scale. While such password leaks are unfortunate security incidents, they provide security researchers and practitioners with good opportunities to learn valuable insights from such leaked passwords, in order to identify ways to improve password policies and other security controls on passwords. Researchers have proposed different data visualisation techniques to help analyse leaked passwords. However, many approaches rely solely on frequency analysis, with limited exploration of distance-based graphs. This paper reports PassViz, a novel method that combines the edit distance with the t-SNE (t-distributed stochastic neighbour embedding) dimensionality reduction algorithm for visualising and analysing leaked passwords in a 2-D space. We implemented PassViz as an easy-to-use command-line tool for visualising large-scale password databases, and also as a graphical user interface (GUI) to support interactive visual analytics of small password databases. Using the "000webhost" leaked database as an example, we show how PassViz can be used to visually analyse different aspects of leaked passwords and to facilitate the discovery of previously unknown password patterns. Overall, our approach empowers researchers and practitioners to gain valuable insights and improve password security through effective data visualisation and analysis.

翻译：尽管其他身份验证方法不断进步，密码仍是最广泛使用的用户认证方式。然而，其局限性——尤其是易受攻击性（特别是人类用户定义的弱密码）——已有充分文献记载。人类定义的弱密码的存在导致网站密码反复泄露，其中许多规模巨大。尽管此类密码泄露是不幸的安全事件，但为安全研究人员和实践者提供了宝贵机会，可从泄露密码中获取重要见解，以寻求改进密码策略及其他密码安全控制措施的方法。研究人员已提出多种数据可视化技术来辅助分析泄露密码，但多数方法仅依赖频率分析，对基于距离的图分析探索有限。本文报告了PassViz这一创新方法，该方法结合编辑距离与t-SNE（t分布随机邻域嵌入）降维算法，在二维空间中可视化分析泄露密码。我们将PassViz实现为易于使用的命令行工具，用于可视化大规模密码数据库，同时提供图形用户界面（GUI）以支持小型密码数据库的交互式可视化分析。以"000webhost"泄露数据库为例，我们展示了PassViz如何用于可视化分析泄露密码的不同维度，并促进发现先前未知的密码模式。总体而言，我们的方法通过有效的数据可视化与分析，赋能研究人员和实践者获取重要见解，提升密码安全性。