We propose a novel approach to learn domain-specific plausible materials for components in the vehicle repair domain by probing Pretrained Language Models (PLMs) in a cloze task style setting to overcome the lack of annotated datasets. We devise a new method to aggregate salient predictions from a set of cloze query templates and show that domain-adaptation using either a small, high-quality or a customized Wikipedia corpus boosts performance. When exploring resource-lean alternatives, we find a distilled PLM clearly outperforming a classic pattern-based algorithm. Further, given that 98% of our domain-specific components are multiword expressions, we successfully exploit the compositionality assumption as a way to address data sparsity.
翻译:我们提出了一种新颖的方法,通过以完形填空任务的形式探询预训练语言模型(PLMs),以克服标注数据集缺失的问题,从而学习车辆维修领域中零部件的领域特定合理材料。我们设计了一种新方法,从一组完形填空查询模板中聚合显著预测,并表明使用小型高质量或定制化的维基百科语料库进行领域自适应能够提升性能。在探索资源精简方案时,我们发现蒸馏后的PLM明显优于经典的基于模式的算法。此外,鉴于我们领域特定零部件中98%为多词表达,我们成功利用了组合性假设作为应对数据稀疏性的手段。