PassViz: A Visualisation System for Analysing Leaked Passwords

from arxiv, Please cite this paper as follows: Sam Parker, Haiyue Yuan and Shujun Li (2023) PassViz: An Interactive Visualisation System for Analysing Leaked Passwords. Proceedings of the 2023 20th IEEE Symposium on Visualization for Cyber Security (VizSec 2023), pp.33-42, IEEE, doi: 10.1109/VizSec60606.2023.00011

Passwords remain the most widely used form of user authentication, despite advancements in other methods. However, their limitations, such as susceptibility to attacks, especially weak passwords defined by human users, are well-documented. The existence of weak human-defined passwords has led to repeated password leaks from websites, many of which are of large scale. While such password leaks are unfortunate security incidents, they provide security researchers and practitioners with good opportunities to learn valuable insights from such leaked passwords, in order to identify ways to improve password policies and other security controls on passwords. Researchers have proposed different data visualisation techniques to help analyse leaked passwords. However, many approaches rely solely on frequency analysis, with limited exploration of distance-based graphs. This paper reports PassViz, a novel method that combines the edit distance with the t-SNE (t-distributed stochastic neighbour embedding) dimensionality reduction algorithm for visualising and analysing leaked passwords in a 2-D space. We implemented PassViz as an easy-to-use command-line tool for visualising large-scale password databases, and also as a graphical user interface (GUI) to support interactive visual analytics of small password databases. Using the "000webhost" leaked database as an example, we show how PassViz can be used to visually analyse different aspects of leaked passwords and to facilitate the discovery of previously unknown password patterns. Overall, our approach empowers researchers and practitioners to gain valuable insights and improve password security through effective data visualisation and analysis.

翻译：密码仍然是使用最广泛的用户认证方式，尽管其他认证方法已取得进展。然而，其局限性——例如易受攻击，尤其是由人类用户定义的弱密码——已有充分文献记载。人为定义的弱密码的存在导致网站反复发生密码泄露事件，其中许多规模巨大。尽管此类密码泄露是不幸的安全事件，但它们为安全研究人员和从业者提供了宝贵机会，可从这些泄露密码中获取重要见解，从而找到改进密码策略及其他密码安全控制措施的方法。研究人员已提出不同的数据可视化技术以帮助分析泄露密码。然而，许多方法仅依赖于频率分析，对基于距离的图形的探索有限。本文报告了PassViz，这是一种结合编辑距离与t-SNE（t分布随机邻域嵌入）降维算法，在二维空间中可视化和分析泄露密码的新方法。我们将PassViz实现为一个易用的命令行工具，用于可视化大规模密码数据库，同时也提供一个图形用户界面（GUI），以支持对小型密码数据库的交互式可视分析。以"000webhost"泄露数据库为例，我们展示了如何利用PassViz从不同方面对泄露密码进行可视化分析，并促进发现之前未知的密码模式。总体而言，我们的方法使研究人员和从业者能够通过有效的数据可视化与分析获得宝贵见解，并提升密码安全性。