Large Language Models (LLMs) have made it easier to create realistic fake profiles on platforms like LinkedIn. This poses a significant risk for text-based fake profile detectors. In this study, we evaluate the robustness of existing detectors against LLM-generated profiles. While highly effective in detecting manually created fake profiles (False Accept Rate: 6-7%), the existing detectors fail to identify GPT-generated profiles (False Accept Rate: 42-52%). We propose GPT-assisted adversarial training as a countermeasure, restoring the False Accept Rate to between 1-7% without impacting the False Reject Rates (0.5-2%). Ablation studies revealed that detectors trained on combined numerical and textual embeddings exhibit the highest robustness, followed by those using numerical-only embeddings, and lastly those using textual-only embeddings. Complementary analysis on the ability of prompt-based GPT-4Turbo and human evaluators affirms the need for robust automated detectors such as the one proposed in this study.
翻译:大型语言模型(LLMs)使得在LinkedIn等平台上创建逼真的虚假个人资料变得更加容易。这对基于文本的虚假个人资料检测器构成了重大风险。在本研究中,我们评估了现有检测器对LLM生成个人资料的鲁棒性。虽然现有检测器在检测人工创建的虚假个人资料方面非常有效(错误接受率:6-7%),但其无法识别GPT生成的个人资料(错误接受率:42-52%)。我们提出采用GPT辅助的对抗训练作为应对措施,将错误接受率恢复至1-7%之间,且不影响错误拒绝率(0.5-2%)。消融研究表明,在数值与文本嵌入组合特征上训练的检测器表现出最高的鲁棒性,其次是仅使用数值嵌入的检测器,而仅使用文本嵌入的检测器鲁棒性最低。对基于提示的GPT-4Turbo与人类评估者能力的补充分析证实,需要如本研究提出的这种鲁棒自动化检测器。