Named entity recognition (NER) is a crucial task for online advertisement. State-of-the-art solutions leverage pre-trained language models for this task. However, three major challenges remain unresolved: web queries differ from natural language, on which pre-trained models are trained; web queries are short and lack contextual information; and labeled data for NER is scarce. We propose DeepTagger, a knowledge-enhanced NER model for web-based ads queries. The proposed knowledge enhancement framework leverages both model-free and model-based approaches. For model-free enhancement, we collect unlabeled web queries to augment domain knowledge; and we collect web search results to enrich the information of ads queries. We further leverage effective prompting methods to automatically generate labels using large language models such as ChatGPT. Additionally, we adopt a model-based knowledge enhancement method based on adversarial data augmentation. We employ a three-stage training framework to train DeepTagger models. Empirical results in various NER tasks demonstrate the effectiveness of the proposed framework.
翻译:命名实体识别(NER)是网络广告中的一项关键任务。当前最先进的解决方案依赖预训练语言模型来完成该任务,但仍有三大挑战尚未解决:网络查询与预训练模型所训练的自然语言存在差异;网络查询简短且缺乏上下文信息;以及NER标注数据稀缺。为此,我们提出DeepTagger——一种面向网络广告查询的知识增强NER模型。该知识增强框架融合了无模型方法与基于模型的方法。在无模型增强方面,我们收集未标注的网络查询以扩充领域知识,并通过收集网络搜索结果来丰富广告查询的信息;同时,我们利用有效的提示方法,借助ChatGPT等大语言模型自动生成标签。此外,我们还采用基于对抗数据增强的模型知识增强方法,并设计三阶段训练框架来训练DeepTagger模型。多种NER任务上的实证结果证明了该框架的有效性。