Current file Overview PassViz: A Visualisation System for Analysing Leaked Passwords

Passwords remain the most widely used form of user authentication, despite advancements in other methods. However, their limitations, such as susceptibility to attacks, especially weak passwords defined by human users, are well-documented. The existence of weak human-defined passwords has led to repeated password leaks from websites, many of which are of large scale. While such password leaks are unfortunate security incidents, they provide security researchers and practitioners with good opportunities to learn valuable insights from such leaked passwords, in order to identify ways to improve password policies and other security controls on passwords. Researchers have proposed different data visualisation techniques to help analyse leaked passwords. However, many approaches rely solely on frequency analysis, with limited exploration of distance-based graphs. This paper reports PassViz, a novel method that combines the edit distance with the t-SNE (t-distributed stochastic neighbour embedding) dimensionality reduction algorithm for visualising and analysing leaked passwords in a 2-D space. We implemented PassViz as an easy-to-use command-line tool for visualising large-scale password databases, and also as a graphical user interface (GUI) to support interactive visual analytics of small password databases. Using the "000webhost" leaked database as an example, we show how PassViz can be used to visually analyse different aspects of leaked passwords and to facilitate the discovery of previously unknown password patterns. Overall, our approach empowers researchers and practitioners to gain valuable insights and improve password security through effective data visualisation and analysis.

翻译：密码仍然是用户认证中使用最广泛的形式，尽管其他方法有所进步。然而，其局限性，例如易受攻击（尤其是人类用户定义的弱密码），已有充分记载。人类定义的弱密码的存在已导致网站密码反复泄露，其中许多规模巨大。尽管此类密码泄露是不幸的安全事件，但它们为安全研究人员和实践者提供了良好机会，可从泄露密码中汲取宝贵见解，以确定改进密码策略及其他密码安全控制措施的方法。研究人员已提出不同的数据可视化技术来帮助分析泄露密码。然而，许多方法仅依赖频率分析，对基于距离的图表的探索有限。本文报告了PassViz，一种新颖方法，它结合编辑距离与t-SNE（t分布随机邻域嵌入）降维算法，在二维空间中可视化和分析泄露密码。我们将PassViz实现为一个易用的命令行工具，用于可视化大规模密码数据库，同时作为图形用户界面（GUI）以支持对小型密码数据库的交互式视觉分析。以“000webhost”泄露数据库为例，我们展示了如何利用PassViz从不同方面可视化分析泄露密码，并促进发现先前未知的密码模式。总体而言，我们的方法使研究人员和实践者能够通过有效的数据可视化和分析，获得宝贵见解并改进密码安全性。