Resources in high-resource languages have not been efficiently exploited in low-resource languages to solve language-dependent research problems. Spanish and French are considered high resource languages in which an adequate level of data resources for informal online social behavior modeling, is observed. However, a machine translation system to access those data resources and transfer their context and tone to a low-resource language like dialectal Arabic, does not exist. In response, we propose a framework that localizes contents of high-resource languages to a low-resource language/dialects by utilizing AI power. To the best of our knowledge, we are the first work to provide a parallel translation dataset from/to informal Spanish and French to/from informal Arabic dialects. Using this, we aim to enrich the under-resource-status dialectal Arabic and fast-track the research of diverse online social behaviors within and across smart cities in different geo-regions. The experimental results have illustrated the capability of our proposed solution in exploiting the resources between high and low resource languages and dialects. Not only this, but it has also been proven that ignoring dialects within the same language could lead to misleading analysis of online social behavior.
翻译:高资源语言的资源尚未被有效利用于低资源语言中,以解决依赖语言的研究问题。西班牙语和法语被视为高资源语言,在这两种语言中,针对非正式在线社交行为建模的数据资源具有足够的丰富度。然而,目前尚无机器翻译系统能够获取这些数据资源并将其语境和语气迁移到低资源语言(如阿拉伯语方言)中。为此,我们提出一个框架,通过利用人工智能技术,将高资源语言的内容本地化到低资源语言/方言。据我们所知,这是首个提供从非正式西班牙语/法语到非正式阿拉伯语方言及其反向翻译的平行语料库的工作。基于此,我们旨在丰富资源匮乏的阿拉伯方言,并加速不同地理区域内智慧城市中多样化在线社交行为的研究。实验结果表明,我们提出的解决方案能够有效利用高资源与低资源语言及方言之间的资源。更重要的是,研究证明,忽略同一语言内部的方言差异可能导致对在线社交行为的误导性分析。