Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts

Detecting online sexual predatory behaviours and abusive language on social media platforms has become a critical area of research due to the growing concerns about online safety, especially for vulnerable populations such as children and adolescents. Researchers have been exploring various techniques and approaches to develop effective detection systems that can identify and mitigate these risks. Recent development of large language models (LLMs) has opened a new opportunity to address this problem more effectively. This paper proposes an approach to detection of online sexual predatory chats and abusive language using the open-source pretrained Llama 2 7B-parameter model, recently released by Meta GenAI. We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, Roman Urdu and Urdu). Based on the power of LLMs, our approach is generic and automated without a manual search for a synergy between feature extraction and classifier design steps like conventional methods in this domain. Experimental results show a strong performance of the proposed approach, which performs proficiently and consistently across three distinct datasets with five sets of experiments. This study's outcomes indicate that the proposed method can be implemented in real-world applications (even with non-English languages) for flagging sexual predators, offensive or toxic content, hate speech, and discriminatory language in online discussions and comments to maintain respectful internet or digital communities. Furthermore, it can be employed for solving text classification problems with other potential applications such as sentiment analysis, spam and phishing detection, sorting legal documents, fake news detection, language identification, user intent recognition, text-based product categorization, medical record analysis, and resume screening.

翻译：检测社交媒体平台上的在线性诱骗行为与辱骂性语言，已成为因应日益增长的在线安全关切（尤其是针对儿童和青少年等弱势群体）的关键研究领域。研究者们不断探索各种技术与方法，以开发能够有效识别并缓解这些风险的检测系统。近期大语言模型（LLMs）的发展为该问题提供了更高效的解决新契机。本文提出一种基于Meta GenAI最新发布的开源预训练Llama 2 7B参数模型的检测方法，用于识别在线性诱骗对话与辱骂性语言。我们采用不同规模、不平衡度及语言（即英语、罗马乌尔都语和乌尔都语）的数据集对LLM进行微调。借助LLMs的强大能力，本方法具有通用性与自动化特性，无需像该领域传统方法那样手动寻求特征提取与分类器设计之间的协同效应。实验结果表明，所提方法性能卓越，在三组不同数据集上通过五组实验均表现出稳定且高效的能力。本研究结果指出，该方法可部署于实际应用场景（甚至包括非英语语言），用于标记在线讨论与评论中的性诱骗者、攻击性或有害内容、仇恨言论及歧视性语言，从而维护健康的互联网或数字社区。此外，该方法还可扩展应用于解决文本分类问题及其他潜在领域，如情感分析、垃圾邮件与钓鱼检测、法律文档分类、虚假新闻检测、语言识别、用户意图识别、基于文本的产品分类、医疗记录分析及简历筛选等。