The prevalence of multi-modal content on social media complicates automated moderation strategies. This calls for an enhancement in multi-modal classification and a deeper understanding of understated meanings in images and memes. Although previous efforts have aimed at improving model performance through fine-tuning, few have explored an end-to-end optimization pipeline that accounts for modalities, prompting, labeling, and fine-tuning. In this study, we propose an end-to-end conceptual framework for model optimization in complex tasks. Experiments support the efficacy of this traditional yet novel framework, achieving the highest accuracy and AUROC. Ablation experiments demonstrate that isolated optimizations are not ineffective on their own.
翻译:社交媒体中多模态内容的普遍存在使得自动化内容审核策略变得复杂。这要求在多模态分类方面进行增强,并更深入地理解图像与表情包中隐含的意义。尽管先前的研究旨在通过微调提升模型性能,但鲜有探索一种端到端的优化流程,该流程需同时考虑模态、提示、标注与微调。在本研究中,我们针对复杂任务中的模型优化,提出了一种端到端的概念框架。实验支持了这一传统而新颖框架的有效性,实现了最高的准确率与AUROC。消融实验表明,孤立的优化措施本身并非无效。