Automatically extracting personal information--such as name, phone number, and email address--from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods--such as regular expression, keyword search, and entity detection--achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
翻译:大规模自动提取公开个人资料中的个人信息(如姓名、电话号码和电子邮件地址)是实施鱼叉式网络钓鱼等众多安全攻击的关键前置步骤。传统方法(如正则表达式、关键词搜索和实体检测)在此类个人信息提取任务中效果有限。本研究通过系统性测量实验,对基于大语言模型(LLM)的个人信息提取方法及其防御对策进行基准评估。为此,我们提出了一个基于LLM的提取攻击框架;构建了四个数据集(包括由GPT-4生成的合成数据集和三个包含八类人工标注个人信息的真实数据集);引入了一种基于提示注入的新型防御策略;并采用十个大语言模型和五个数据集对基于LLM的攻击与防御方法进行了系统性基准测试。主要发现包括:攻击者可滥用LLM从个人资料中准确提取各类个人信息;LLM在提取效果上优于传统方法;提示注入能有效抵御基于LLM的强攻击,将其降级为效果较弱的传统攻击形式。