A line of work on Transformer-based language models such as BERT has attempted to use syntactic inductive bias to enhance the pretraining process, on the theory that building syntactic structure into the training process should reduce the amount of data needed for training. But such methods are often tested for high-resource languages such as English. In this work, we investigate whether these methods can compensate for data sparseness in low-resource languages, hypothesizing that they ought to be more effective for low-resource languages. We experiment with five low-resource languages: Uyghur, Wolof, Maltese, Coptic, and Ancient Greek. We find that these syntactic inductive bias methods produce uneven results in low-resource settings, and provide surprisingly little benefit in most cases.
翻译:基于Transformer的语言模型(如BERT)的一系列工作尝试利用句法归纳偏置来增强预训练过程,其理论依据是将句法结构融入训练过程应能减少训练所需的数据量。但此类方法通常仅在英语等高资源语言上进行测试。本研究探讨这些方法能否补偿低资源语言中的数据稀疏性问题,假设它们对低资源语言应更有效。我们对五种低资源语言进行了实验:维吾尔语、沃洛夫语、马耳他语、科普特语和古希腊语。研究发现,这些句法归纳偏置方法在低资源场景下产生的结果并不均衡,且在多数情况下带来的益处出乎意料地有限。