Due to the significant time and effort required for handcrafting translations, most manga never leave the domestic Japanese market. Automatic manga translation is a promising potential solution. However, it is a budding and underdeveloped field and presents complexities even greater than those found in standard translation due to the need to effectively incorporate visual elements into the translation process to resolve ambiguities. In this work, we investigate to what extent multimodal large language models (LLMs) can provide effective manga translation, thereby assisting manga authors and publishers in reaching wider audiences. Specifically, we propose a methodology that leverages the vision component of multimodal LLMs to improve translation quality and evaluate the impact of translation unit size, context length, and propose a token efficient approach for manga translation. Moreover, we introduce a new evaluation dataset -- the first parallel Japanese-Polish manga translation dataset -- as part of a benchmark to be used in future research. Finally, we contribute an open-source software suite, enabling others to benchmark LLMs for manga translation. Our findings demonstrate that our proposed methods achieve state-of-the-art results for Japanese-English translation and set a new standard for Japanese-Polish.
翻译:由于手工翻译需要大量时间和精力,大多数漫画从未走出日本国内市场。自动漫画翻译是一个具有前景的潜在解决方案。然而,该领域尚处于萌芽和发展不足阶段,其复杂性甚至超过标准翻译任务,因为需要有效整合视觉元素到翻译过程中以消除歧义。本研究探讨多模态大语言模型(LLMs)能在何种程度上实现有效的漫画翻译,从而帮助漫画作者和出版商触达更广泛的受众。具体而言,我们提出一种利用多模态LLMs视觉组件提升翻译质量的方法论,评估翻译单元大小和上下文长度的影响,并提出一种面向漫画翻译的令牌高效方法。此外,我们引入首个日-波平行漫画翻译数据集作为基准测试集,以供未来研究使用。最后,我们贡献了开源软件套件,使其他研究者能够对LLMs进行漫画翻译基准测试。实验结果表明,我们提出的方法在日英翻译任务上取得了最先进的性能,并为日波翻译设立了新的基准。