Movie posters are not just decorative; they are meticulously designed to capture the essence of a movie, such as its genre, storyline, and tone/vibe. For decades, movie posters have graced cinema walls, billboards, and now our digital screens as a form of digital posters. Movie genre classification plays a pivotal role in film marketing, audience engagement, and recommendation systems. Previous explorations into movie genre classification have been mostly examined in plot summaries, subtitles, trailers and movie scenes. Movie posters provide a pre-release tantalizing glimpse into a film's key aspects, which can ignite public interest. In this paper, we presented the framework that exploits movie posters from a visual and textual perspective to address the multilabel movie genre classification problem. Firstly, we extracted text from movie posters using an OCR and retrieved the relevant embedding. Next, we introduce a cross-attention-based fusion module to allocate attention weights to visual and textual embedding. In validating our framework, we utilized 13882 posters sourced from the Internet Movie Database (IMDb). The outcomes of the experiments indicate that our model exhibited promising performance and outperformed even some prominent contemporary architectures.
翻译:电影海报并非仅为装饰,而是经过精心设计以捕捉电影的精髓,如类型、故事情节与基调氛围。数十年来,电影海报以数字海报的形式装点着影院墙面、广告牌及如今的数字屏幕。电影类型分类在电影营销、观众互动与推荐系统中起着关键作用。以往对电影类型分类的探索多集中于剧情摘要、字幕、预告片及电影场景。电影海报能在上映前提供对影片关键要素的诱人一瞥,从而激发公众兴趣。本文提出一种框架,从视觉与文本双重视角挖掘电影海报信息,以解决多标签电影类型分类问题。首先,我们通过OCR技术从电影海报中提取文本并获取相关嵌入表示。随后,引入基于交叉注意力的融合模块,为视觉与文本嵌入分配注意力权重。为验证框架有效性,我们使用了来自互联网电影数据库(IMDb)的13882张海报进行实验。结果表明,我们的模型展现出优异的性能表现,甚至超越了部分当代主流架构。