This thesis explores challenges in semantic parsing, specifically focusing on scenarios with limited data and computational resources. It offers solutions using techniques like automatic data curation, knowledge transfer, active learning, and continual learning. For tasks with no parallel training data, the thesis proposes generating synthetic training examples from structured database schemas. When there is abundant data in a source domain but limited parallel data in a target domain, knowledge from the source is leveraged to improve parsing in the target domain. For multilingual situations with limited data in the target languages, the thesis introduces a method to adapt parsers using a limited human translation budget. Active learning is applied to select source-language samples for manual translation, maximizing parser performance in the target language. In addition, an alternative method is also proposed to utilize machine translation services, supplemented by human-translated data, to train a more effective parser. When computational resources are limited, a continual learning approach is introduced to minimize training time and computational memory. This maintains the parser's efficiency in previously learned tasks while adapting it to new tasks, mitigating the problem of catastrophic forgetting. Overall, the thesis provides a comprehensive set of methods to improve semantic parsing in resource-constrained conditions.
翻译:本论文探讨了语义解析中的挑战,重点关注数据与计算资源有限的场景。我们提出了利用自动数据整理、知识迁移、主动学习和持续学习等技术的解决方案。针对无并行训练数据的任务,本文提出从结构化数据库模式中生成合成训练示例。当源领域数据充足但目标领域并行数据有限时,我们利用源领域知识提升目标领域的解析性能。针对目标语言数据有限的多语言场景,本文提出了一种利用有限人工翻译预算适配解析器的方法。通过主动学习选择待翻译的源语言样本,以最大化目标语言的解析器性能。此外,还提出了一种替代方法,利用机器翻译服务并辅以人工翻译数据来训练更高效的解析器。当计算资源受限时,引入持续学习方法以最小化训练时间和计算内存,在适应新任务的同时保持解析器对先前学习任务的效率,缓解灾难性遗忘问题。总体而言,本文提供了一套全面的方法体系,用于改进资源受限条件下的语义解析性能。