CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models

Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models. This could present a significant obstacle for language community members and linguists to use NLP tools. This paper introduces the CMU Linguistic Annotation Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models. CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages, even with limited training data. We describe various tools and APIs that are currently available and how developers can easily add new models/functionality to the framework. Code is available at https://github.com/neulab/cmulab along with a live demo at https://cmulab.dev

翻译：在资源匮乏的语言中有效使用自然语言处理（NLP）工具，要求使用者对语言本身有深入理解、熟悉最新模型与训练方法，并具备部署这些模型的技术能力。这对于语言社区成员和语言学家使用NLP工具可能构成重大障碍。本文介绍CMU语言标注后端（CMU Linguistic Annotation Backend），这是一个简化模型部署与持续人机协同微调的开源框架。CMULAB使用户能够利用多语言模型的能力，快速将语音识别、光学字符识别（OCR）、翻译及句法分析等现有工具适配并扩展到新语言，即使训练数据有限。我们描述了当前可用的多种工具和API，并说明了开发者如何便捷地向框架中添加新模型或功能。代码及在线演示分别托管于 https://github.com/neulab/cmulab 和 https://cmulab.dev。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日