fUGAshi, 一种在比顿用日本语消灭日本语的工具 (fugashi, a Tool for Tokenizing Japanese in Python) - 专知论文

会员服务 ·

0

词元分析器 · Python · Processing（编程语言） · NLP · 自然语言处理 ·

2020 年 10 月 14 日

fugashi, a Tool for Tokenizing Japanese in Python

翻译：fUGAshi, 一种在比顿用日本语消灭日本语的工具

from arxiv, Accepted at EMNLP2020's NLP-OSS workshop

Recent years have seen an increase in the number of large-scale multilingual NLP projects. However, even in such projects, languages with special processing requirements are often excluded. One such language is Japanese. Japanese is written without spaces, tokenization is non-trivial, and while high quality open source tokenizers exist they can be hard to use and lack English documentation. This paper introduces fugashi, a MeCab wrapper for Python, and gives an introduction to tokenizing Japanese.

翻译：近年来,大型多语言NLP项目的数量有所增加,但是,即使在这类项目中,也有特殊处理要求的语言也常常被排除在外。其中一种语言是日语。日语是日本语,没有空格写字,象征性化是非三维的,虽然存在高质量的开放源代码符号,但很难使用,也缺乏英文文件。本文为Python介绍了有特殊处理要求的MeCab包装器Fugashi, 并介绍了象征性化日语。

0

相关内容

词元分析器

词元分析器

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

使用Python进行医疗临床文本处理，37页ppt

使用Python进行医疗临床文本处理，37页ppt

专知会员服务

40+阅读 · 2020年8月5日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

197+阅读 · 2020年2月1日

【新书】Python数据科学食谱（Python Data Science Cookbook）

【新书】Python数据科学食谱（Python Data Science Cookbook）

专知会员服务

118+阅读 · 2020年1月1日

【干货】用BRET进行多标签文本分类（附代码）

【干货】用BRET进行多标签文本分类（附代码）

专知会员服务

85+阅读 · 2019年12月27日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

已删除

AI科技评论

4+阅读 · 2018年8月12日

A decision-making tool to fine-tune abnormal levels in the complete blood count tests

Arxiv

0+阅读 · 2020年11月24日

Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions

Arxiv

0+阅读 · 2020年11月23日

An Interactive Foreign Language Trainer Using Assessment and Feedback Modalities

Arxiv

0+阅读 · 2020年11月23日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Chinese Word Segmentation: Another Decade Review (2007-2017)

Chinese Word Segmentation: Another Decade Review (2007-2017)

Arxiv

4+阅读 · 2019年1月18日

Japanese Predicate Conjugation for Neural Machine Translation

Arxiv

3+阅读 · 2018年5月25日

SentiPers: A Sentiment Analysis Corpus for Persian

Arxiv

5+阅读 · 2018年1月23日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

Translating Pro-Drop Languages with Reconstruction Models

Arxiv

3+阅读 · 2018年1月10日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

VIP会员

文章信息

相关主题

词元分析器

Processing（编程语言）

自然语言处理

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

使用Python进行医疗临床文本处理，37页ppt

使用Python进行医疗临床文本处理，37页ppt

专知会员服务

40+阅读 · 2020年8月5日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

197+阅读 · 2020年2月1日

【新书】Python数据科学食谱（Python Data Science Cookbook）

【新书】Python数据科学食谱（Python Data Science Cookbook）

专知会员服务

118+阅读 · 2020年1月1日

【干货】用BRET进行多标签文本分类（附代码）

【干货】用BRET进行多标签文本分类（附代码）

专知会员服务

85+阅读 · 2019年12月27日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

《可信人工智能赋能系统的支柱》

【CMU博士论文】可靠轨迹预测的分层基石：数据、评估与方法

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

相关资讯

已删除

AI科技评论

4+阅读 · 2018年8月12日

相关论文

A decision-making tool to fine-tune abnormal levels in the complete blood count tests

Arxiv

0+阅读 · 2020年11月24日

Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions

Arxiv

0+阅读 · 2020年11月23日

An Interactive Foreign Language Trainer Using Assessment and Feedback Modalities

Arxiv

0+阅读 · 2020年11月23日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Chinese Word Segmentation: Another Decade Review (2007-2017)

Chinese Word Segmentation: Another Decade Review (2007-2017)

Arxiv

4+阅读 · 2019年1月18日

Japanese Predicate Conjugation for Neural Machine Translation

Arxiv

3+阅读 · 2018年5月25日

SentiPers: A Sentiment Analysis Corpus for Persian

Arxiv

5+阅读 · 2018年1月23日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

Translating Pro-Drop Languages with Reconstruction Models

Arxiv

3+阅读 · 2018年1月10日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

微信扫码咨询专知VIP会员