An important application scenario of precision agriculture is detecting and measuring crop health threats using sensors and data analysis techniques. However, the textual data are still under-explored among the existing solutions due to the lack of labelled data and fine-grained semantic resources. Recent research suggests that the increasing connectivity of farmers and the emergence of online farming communities make social media like Twitter a participatory platform for detecting unfamiliar plant health events if we can extract essential information from unstructured textual data. ChouBERT is a French pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards. This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
翻译:精准农业的一个重要应用场景是利用传感器和数据分析技术检测和量化作物健康威胁。然而,由于缺乏标注数据和细粒度语义资源,文本数据在现有解决方案中仍未得到充分探索。最新研究表明,随着农民之间互联互通的增强以及在线农业社区的出现,若能从非结构化文本数据中提取关键信息,社交媒体(如推特)可成为检测未知植物健康事件的参与式平台。ChouBERT是一种法语预训练语言模型,能够识别涉及植物健康问题观察的推特内容,并对未预见的自然危害具有泛化能力。本文通过进一步研究ChouBERT在少量标注集上的词级标注任务能力,解决了标注数据匮乏的问题。