Transfer learning leverages the abundance of English data to address the scarcity of resources in modeling non-English languages, such as Korean. In this study, we explore the potential of Phrase Aligned Data (PAD) from standardized Statistical Machine Translation (SMT) to enhance the efficiency of transfer learning. Through extensive experiments, we demonstrate that PAD synergizes effectively with the syntactic characteristics of the Korean language, mitigating the weaknesses of SMT and significantly improving model performance. Moreover, we reveal that PAD complements traditional data construction methods and enhances their effectiveness when combined. This innovative approach not only boosts model performance but also suggests a cost-efficient solution for resource-scarce languages.
翻译:迁移学习利用丰富的英语数据资源,以解决非英语语言(如韩语)建模中的资源稀缺问题。本研究探索了从标准化统计机器翻译(SMT)中获取的短语对齐数据(PAD)在提升迁移学习效率方面的潜力。通过大量实验,我们证明PAD能有效协同韩语的句法特征,弥补SMT的不足,显著提升模型性能。此外,我们发现PAD能够与传统数据构建方法形成互补,并在结合使用时增强其有效性。这一创新方法不仅提升了模型性能,也为资源稀缺语言提供了一种经济高效的解决方案。