Social media has become integral to human interaction, providing a platform for communication and expression. However, the rise of hate speech on these platforms poses significant risks to individuals and communities. Detecting and addressing hate speech is particularly challenging in languages like Portuguese due to its rich vocabulary, complex grammar, and regional variations. To address this, we introduce TuPy-E, the largest annotated Portuguese corpus for hate speech detection. TuPy-E leverages an open-source approach, fostering collaboration within the research community. We conduct a detailed analysis using advanced techniques like BERT models, contributing to both academic understanding and practical applications
翻译:社交媒体已成为人类互动不可或缺的一部分,为沟通与表达提供了平台。然而,这些平台上仇恨言论的兴起对个人和社区构成了重大风险。在葡萄牙语等语言中,由于其丰富的词汇、复杂的语法和区域变体,检测和处理仇恨言论尤为具有挑战性。为此,我们引入了TuPy-E——目前规模最大的用于仇恨言论检测的葡萄牙语标注语料库。TuPy-E采用开源方法,促进了研究社区内的协作。我们利用BERT模型等先进技术进行了详细分析,既推动了学术理解也助力了实际应用。