Movie posters are not just decorative; they are meticulously designed to capture the essence of a movie, such as its genre, storyline, and tone/vibe. For decades, movie posters have graced cinema walls, billboards, and now our digital screens as a form of digital posters. Movie genre classification plays a pivotal role in film marketing, audience engagement, and recommendation systems. Previous explorations into movie genre classification have been mostly examined in plot summaries, subtitles, trailers and movie scenes. Movie posters provide a pre-release tantalizing glimpse into a film's key aspects, which can ignite public interest. In this paper, we presented the framework that exploits movie posters from a visual and textual perspective to address the multilabel movie genre classification problem. Firstly, we extracted text from movie posters using an OCR and retrieved the relevant embedding. Next, we introduce a cross-attention-based fusion module to allocate attention weights to visual and textual embedding. In validating our framework, we utilized 13882 posters sourced from the Internet Movie Database (IMDb). The outcomes of the experiments indicate that our model exhibited promising performance and outperformed even some prominent contemporary architectures.
翻译:电影海报并非仅仅是装饰品,而是经过精心设计以捕捉电影精髓(如类型、故事情节和基调/氛围)的媒介。数十年来,电影海报以数字海报的形式装点着影院墙壁、广告牌以及如今的数字屏幕。电影类型分类在电影营销、观众互动和推荐系统中发挥着关键作用。以往对电影类型分类的探索主要集中在剧情摘要、字幕、预告片和电影场景的分析上。电影海报能在影片上映前提供对其关键要素的诱人一瞥,从而激发公众兴趣。本文提出一种框架,从视觉和文本双重视角挖掘电影海报信息,以解决多标签电影类型分类问题。首先,我们使用OCR技术从电影海报中提取文本并获取相关嵌入表示。接着,我们引入基于交叉注意力的融合模块,为视觉和文本嵌入分配注意力权重。为验证框架有效性,我们使用了来自互联网电影数据库(IMDb)的13882张海报进行实验。结果表明,我们的模型展现出优异的性能,甚至超越了部分当代主流架构。