The rapid integration of large language models into newsroom workflows has raised urgent questions about the prevalence of AI-generated content in online media. While computational studies have begun to quantify this phenomenon in English-language outlets, no empirical investigation exists for Turkish news media, where existing research remains limited to qualitative interviews with journalists or fake news detection. This study addresses that gap by fine-tuning a Turkish-specific BERT model (dbmdz/bert-base-turkish-cased) on a labeled dataset of 3,600 articles from three major Turkish outlets with distinct editorial orientations for binary classification of AI-rewritten content. The model achieves 0.9708 F1 score on the held-out test set with symmetric precision and recall across both classes. Subsequent deployment on over 3,500 unseen articles spanning between 2023 and 2026 reveals consistent cross-source and temporally stable classification patterns, with mean prediction confidence exceeding 0.96 and an estimated 2.5 percentage of examined news content rewritten or revised by LLMs on average. To the best of our knowledge, this is the first study to move beyond self-reported journalist perceptions toward empirical, data-driven measurement of AI usage in Turkish news media.
翻译:大型语言模型在新闻编辑室工作流程中的快速融合,引发了关于在线媒体中AI生成内容普遍性的紧迫问题。尽管计算研究已开始量化英语媒体中的这一现象,但对土耳其新闻媒体尚无实证调查,现有研究仍局限于对记者的定性访谈或虚假新闻检测。本研究通过在一个标注数据集上微调土耳其语专用BERT模型(dbmdz/bert-base-turkish-cased)来填补这一空白,该数据集包含来自三家具有不同编辑取向的土耳其主流媒体的3600篇文章,用于AI改写内容的二元分类。该模型在保留测试集上取得了0.9708的F1分数,且两个类别的精确率与召回率对称。随后对2023年至2026年间超过3500篇未见文章进行部署分析,揭示了跨新闻源一致且时间维度稳定的分类模式,平均预测置信度超过0.96,估计平均有2.5%的受检新闻内容被大型语言模型改写或修订。据我们所知,这是首个超越记者自我报告感知、对土耳其新闻媒体中AI使用情况进行实证数据驱动测量的研究。