TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification

Temporal expression identification is crucial for understanding texts written in natural language. Although highly effective systems such as HeidelTime exist, their limited runtime performance hampers adoption in large-scale applications and production environments. In this paper, we introduce the TEI2GO models, matching HeidelTime's effectiveness but with significantly improved runtime, supporting six languages, and achieving state-of-the-art results in four of them. To train the TEI2GO models, we used a combination of manually annotated reference corpus and developed ``Professor HeidelTime'', a comprehensive weakly labeled corpus of news texts annotated with HeidelTime. This corpus comprises a total of $138,069$ documents (over six languages) with $1,050,921$ temporal expressions, the largest open-source annotated dataset for temporal expression identification to date. By describing how the models were produced, we aim to encourage the research community to further explore, refine, and extend the set of models to additional languages and domains. Code, annotations, and models are openly available for community exploration and use. The models are conveniently on HuggingFace for seamless integration and application.

翻译：时间表达式识别对于理解自然语言文本至关重要。尽管存在诸如HeidelTime等高效系统，但其有限的运行时性能阻碍了在大规模应用及生产环境中的推广。本文介绍了TEI2GO模型，该模型在保持与HeidelTime同等有效性的同时显著提升了运行时性能，支持六种语言，并在其中四种语言上取得了最先进的结果。为训练TEI2GO模型，我们结合了人工标注参考语料库与所开发的"Professor HeidelTime"——一个经HeidelTime标注的新闻文本综合弱标注语料库。该语料库共包含$138,069$篇文档（涵盖六种语言），包含$1,050,921$个时间表达式，是迄今为止最大的开源时间表达式识别标注数据集。通过阐述模型构建过程，我们期望激励研究社区进一步探索、优化并将模型集扩展到更多语言和领域。代码、标注数据及模型均已开源以供社区探索和使用。模型已便捷部署至HuggingFace平台，可实现无缝集成与调用。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日