Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment

from arxiv, Achieved top position with a word error rate of 0.10582 in the public ranking of DataVerse Challenge - ITVerse 2023 (link: https://www.kaggle.com/competitions/dataverse_2023/). All codes can be found on the respective competition webpage

The International Phonetic Alphabet (IPA) is indispensable in language learning and understanding, aiding users in accurate pronunciation and comprehension. Additionally, it plays a pivotal role in speech therapy, linguistic research, accurate transliteration, and the development of text-to-speech systems, making it an essential tool across diverse fields. Bangla being 7th as one of the widely used languages, gives rise to the need for IPA in its domain. Its IPA mapping is too diverse to be captured manually giving the need for Artificial Intelligence and Machine Learning in this field. In this study, we have utilized a transformer-based sequence-to-sequence model at the letter and symbol level to get the IPA of each Bangla word as the variation of IPA in association of different words is almost null. Our transformer model only consisted of 8.5 million parameters with only a single decoder and encoder layer. Additionally, to handle the punctuation marks and the occurrence of foreign languages in the text, we have utilized manual mapping as the model won't be able to learn to separate them from Bangla words while decreasing our required computational resources. Finally, maintaining the relative position of the sentence component IPAs and generation of the combined IPA has led us to achieve the top position with a word error rate of 0.10582 in the public ranking of DataVerse Challenge - ITVerse 2023 (https://www.kaggle.com/competitions/dataverse_2023/).

翻译：国际音标（IPA）在语言学习与理解中不可或缺，可帮助用户准确发音和理解语义。此外，它在言语治疗、语言学研究、精确音译及文本转语音系统开发中发挥着关键作用，成为跨领域的重要工具。孟加拉语作为全球使用最广泛的第七大语言，其领域内对IPA的需求日益凸显。由于该语言的IPA映射多样性过高，难以通过人工规则全面捕获，因此亟需引入人工智能与机器学习技术。本研究采用基于Transformer的序列到序列模型，在字母与符号级别实现每个孟加拉语单词的IPA转写——因为不同单词中IPA的变异性几乎为零。我们的Transformer模型仅包含850万参数，由单层解码器和编码器组成。此外，为处理文本中的标点符号及外来语言现象，我们采用人工映射方案，因为模型无法区分这些元素与孟加拉语单词的边界，同时该方案可降低所需计算资源。最终，通过保持句子成分IPA的相对位置并生成组合式IPA，我们在DataVerse Challenge - ITVerse 2023（https://www.kaggle.com/competitions/dataverse_2023/）公开排名中取得最优成绩，词错误率为0.10582。