The rise of QR code-based phishing ("Quishing") poses a growing cybersecurity threat, as attackers increasingly exploit QR codes to bypass traditional phishing defenses. Existing detection methods predominantly focus on URL analysis, which requires the extraction of the QR code payload, and may inadvertently expose users to malicious content. Moreover, QR codes can encode various types of data beyond URLs, such as Wi-Fi credentials and payment information, making URL-based detection insufficient for broader security concerns. To address these gaps, we propose the first framework for quishing detection that directly analyzes QR code structure and pixel patterns without extracting the embedded content. We generated a dataset of phishing and benign QR codes and we used it to train and evaluate multiple machine learning models, including Logistic Regression, Decision Trees, Random Forest, Naïve Bayes, LightGBM, and XGBoost. Our best-performing model (XGBoost) achieves an AUC of 0.9106, demonstrating the feasibility of QR-centric detection. Through feature importance analysis, we identify key visual patterns correlated with phishing labels and refine our feature set by removing non-informative pixels, improving performance to an AUC of 0.9133 with a reduced feature space. Our findings reveal that the structural features of QR code correlate strongly with phishing risk. This work establishes a foundation for quishing mitigation and highlights the potential of direct QR analysis as a critical layer in modern phishing defenses.
翻译:随着基于二维码的钓鱼攻击("Quishing")日益增多,攻击者利用二维码绕过传统钓鱼防御手段,对网络安全构成持续增长的威胁。现有检测方法主要依赖URL分析,这需要提取二维码载荷内容,且可能无意中将用户暴露于恶意内容之下。此外,二维码可编码URL以外的多种数据类型(如Wi-Fi凭证和支付信息),使得基于URL的检测方法无法应对更广泛的安全威胁。为填补上述空白,本文提出首个无需提取嵌入内容、直接分析二维码结构与像素模式的钓鱼检测框架。我们构建了包含钓鱼与良性二维码的数据集,并用于训练与评估多种机器学习模型,包括逻辑回归、决策树、随机森林、朴素贝叶斯、LightGBM与XGBoost。性能最优模型(XGBoost)的AUC达到0.9106,验证了以二维码为中心的检测方法的可行性。通过特征重要性分析,我们识别出与钓鱼标签相关的关键视觉模式,并通过剔除无信息像素优化特征集,在降低特征空间的同时将AUC提升至0.9133。研究结果表明,二维码的结构特征与钓鱼风险存在强相关性。本工作为"Quishing"防御奠定了基础,凸显了直接二维码分析作为现代钓鱼防御关键环节的潜力。