Automatically extracting personal information -- such as name, phone number, and email address -- from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods -- such as regular expression, keyword search, and entity detection -- achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
翻译:从公开资料中大规模自动提取个人信息(如姓名、电话号码和电子邮件地址)是许多其他安全攻击(包括鱼叉式网络钓鱼)的基石。传统方法(如正则表达式、关键词搜索和实体检测)在此类个人信息提取方面效果有限。本研究开展系统性测量评估,对基于大语言模型的个人信息提取及其防御措施进行基准测试。为此,我们构建了基于LLM的提取攻击框架;收集了四个数据集,包括由GPT-4生成的合成数据集和三个手动标注了八类个人信息的真实世界数据集;引入了一种基于提示注入的创新防御策略;并使用十个LLM和五个数据集系统性地对基于LLM的攻击与防御措施进行基准测试。主要发现包括:攻击者可滥用LLM从个人资料中精准提取多种个人信息;LLM的性能优于传统方法;提示注入能有效抵御基于LLM的强攻击,将其削弱至效果较差的传统攻击水平。