Automatically extracting personal information -- such as name, phone number, and email address -- from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods -- such as regular expression, keyword search, and entity detection -- achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
翻译:从公开可得的个人资料中大规模自动提取个人信息(如姓名、电话号码和电子邮件地址)是实施鱼叉式网络钓鱼等众多安全攻击的关键步骤。传统方法(如正则表达式、关键词搜索和实体检测)在此类个人信息提取任务中效果有限。本研究通过系统性测量分析,对基于大型语言模型(LLM)的个人信息提取方法及其防御对策进行了基准评估。为此,我们提出了一个基于LLM的提取攻击框架;构建了四个数据集,包括一个由GPT-4生成的合成数据集和三个包含人工标注八类个人信息的真实数据集;提出了一种基于提示注入的新型防御策略;并利用十个LLM模型和五个数据集对基于LLM的攻击与防御方法进行了系统性基准测试。我们的主要发现包括:攻击者可滥用LLM从个人资料中准确提取各类个人信息;LLM的性能优于传统方法;而提示注入技术能有效抵御基于LLM的强攻击,将其削弱至与传统方法相当的低效水平。