This study aims to fill the gap by constructing a topic-aware comparable corpus of Mainland Chinese Mandarin and Taiwanese Mandarin from the social media in Mainland China and Taiwan, respectively. Using Dcard for Taiwanese Mandarin and Sina Weibo for Mainland Chinese, we create a comparable corpus that updates regularly and reflects modern language use on social media.
翻译:本研究旨在通过分别从中国大陆和台湾地区的社交媒体中构建主题感知型中国大陆普通话与台湾普通话可比语料库,以填补现有研究的空白。我们利用Dcard作为台湾普通话的语料来源,新浪微博作为中国大陆普通话的语料来源,创建了一个定期更新并能反映社交媒体上现代语言使用的可比语料库。