Memes are the new-age conveyance mechanism for humor on social media sites. Memes often include an image and some text. Memes can be used to promote disinformation or hatred, thus it is crucial to investigate in details. We introduce Memotion 3, a new dataset with 10,000 annotated memes. Unlike other prevalent datasets in the domain, including prior iterations of Memotion, Memotion 3 introduces Hindi-English Codemixed memes while prior works in the area were limited to only the English memes. We describe the Memotion task, the data collection and the dataset creation methodologies. We also provide a baseline for the task. The baseline code and dataset will be made available at https://github.com/Shreyashm16/Memotion-3.0
翻译:模因是社交媒体平台上传播幽默的新型载体。模因通常包含图片和文字,可能被用于散布虚假信息或仇恨言论,因此对其进行深入研究至关重要。我们提出了Memotion 3新数据集,包含10,000个带标注的模因。与领域内其他主流数据集(包括Memotion先前版本)不同,Memotion 3引入了印地语-英语混合编码模因,而此前相关研究仅限于英语模因。本文描述了Memotion任务、数据收集方法及数据集构建流程,并提供了该任务的基线模型。基线代码与数据集将在https://github.com/Shreyashm16/Memotion-3.0 开源。