Manga109Dialog A Large-scale Dialogue Dataset for Comics Speaker Detection

The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characters' personalities, and inference of character relationships and stories. To deal with the problem of insufficient speaker-to-text annotations, we created a new annotation dataset Manga109Dialog based on Manga109. Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs. We further divided our dataset into different levels by prediction difficulties to evaluate speaker detection methods more appropriately. Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models. Due to the unique features of comics, we enhance the performance of our proposed model by considering the frame reading order. We conducted experiments using Manga109Dialog and other datasets. Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.

翻译：电子漫画市场的扩张激发了自动分析漫画方法的研究兴趣。为深入理解漫画，需要一种将漫画中文本与对应说话角色自动关联的方法。漫画说话人检测研究具有实际应用价值，例如为有声读物自动分配角色、根据角色性格进行自动翻译，以及推断角色关系与故事情节。为解决说话人与文本标注不足的问题，我们基于Manga109创建了新的标注数据集Manga109Dialog。Manga109Dialog是全球最大的漫画说话人标注数据集，包含132,692个说话人-文本对。我们进一步根据预测难度将数据集划分为不同层级，以更合理地评估说话人检测方法。与现有主要基于距离的方法不同，我们提出了一种利用场景图生成模型的深度学习方法。针对漫画的独特特征，我们通过考虑分镜阅读顺序来增强所提模型的性能。我们使用Manga109Dialog及其他数据集进行了实验，结果表明，基于场景图的方法优于现有方法，预测准确率超过75%。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日