We present Algerian Dialect, a large-scale sentiment-annotated dataset consisting of 45,000 YouTube comments written in Algerian Arabic dialect. The comments were collected from more than 30 Algerian press and media channels using the YouTube Data API. Each comment is manually annotated into one of five sentiment categories: very negative, negative, neutral, positive, and very positive. In addition to sentiment labels, the dataset includes rich metadata such as collection timestamps, like counts, video URLs, and annotation dates. This dataset addresses the scarcity of publicly available resources for Algerian dialect and aims to support research in sentiment analysis, dialectal Arabic NLP, and social media analytics. The dataset is publicly available on Mendeley Data under a CC BY 4.0 license at https://doi.org/10.17632/zzwg3nnhsz.2.
翻译:本文介绍了阿尔及利亚方言数据集,这是一个包含45,000条阿尔及利亚阿拉伯语方言YouTube评论的大规模情感标注数据集。评论通过YouTube Data API从30多个阿尔及利亚新闻及媒体频道收集。每条评论均被人工标注为五类情感类别之一:极度负面、负面、中性、正面、极度正面。除情感标签外,数据集还包含丰富的元数据,如采集时间戳、点赞数、视频URL及标注日期。该数据集旨在缓解阿尔及利亚方言公开资源的稀缺现状,以支持情感分析、方言阿拉伯语自然语言处理及社交媒体分析等领域的研究。数据集遵循CC BY 4.0许可协议,已在Mendeley Data平台公开,访问地址为https://doi.org/10.17632/zzwg3nnhsz.2。