This paper examines the efficacy of utilizing large language models (LLMs) to detect public threats posted online. Amid rising concerns over the spread of threatening rhetoric and advance notices of violence, automated content analysis techniques may aid in early identification and moderation. Custom data collection tools were developed to amass post titles from a popular Korean online community, comprising 500 non-threat examples and 20 threats. Various LLMs (GPT-3.5, GPT-4, PaLM) were prompted to classify individual posts as either "threat" or "safe." Statistical analysis found all models demonstrated strong accuracy, passing chi-square goodness of fit tests for both threat and non-threat identification. GPT-4 performed best overall with 97.9% non-threat and 100% threat accuracy. Affordability analysis also showed PaLM API pricing as highly cost-efficient. The findings indicate LLMs can effectively augment human content moderation at scale to help mitigate emerging online risks. However, biases, transparency, and ethical oversight remain vital considerations before real-world implementation.
翻译:本文探讨了利用大型语言模型(LLMs)检测网络公开威胁的有效性。针对威胁性言论传播及暴力预警事件日益增多的现状,自动化内容分析技术或可辅助早期识别与管控。研究开发了定制化数据采集工具,从韩国某热门网络社区收集帖子标题,构建包含500条非威胁样本和20条威胁样本的数据集。采用多种LLMs(GPT-3.5、GPT-4、PaLM)对单条帖子进行"威胁"或"安全"二分类,统计结果显示所有模型均具有较高准确率,通过卡方拟合优度检验验证了威胁与非威胁识别的有效性。其中GPT-4综合表现最优,非威胁识别准确率达97.9%,威胁识别准确率达100%。成本效益分析表明PaLM API定价具有高度经济性。研究结果表明,LLMs可在规模化场景下有效增强人工内容审核能力,助力缓解新型网络风险。但在实际部署前,仍需审慎考虑模型偏见、透明性与伦理监管等关键问题。