Accurate neural models are much less efficient than non-neural models and are useless for processing billions of social media posts or handling user queries in real time with a limited budget. This study revisits the fastest pattern-based NLP methods to make them as accurate as possible, thus yielding a strikingly simple yet surprisingly accurate morphological analyzer for Japanese. The proposed method induces reliable patterns from a morphological dictionary and annotated data. Experimental results on two standard datasets confirm that the method exhibits comparable accuracy to learning-based baselines, while boasting a remarkable throughput of over 1,000,000 sentences per second on a single modern CPU. The source code is available at https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jagger/
翻译:精确的神经模型效率远低于非神经模型,在有限预算下处理数十亿社交媒体帖子或实时响应用户查询时显得无用。本研究重新审视了基于模式的最快自然语言处理方法,力求在保持其高效性的同时尽可能提升准确性,从而为日语提出一种极其简单却出人意料准确的形态分析器。该方法从形态词典和标注数据中诱导出可靠模式。在两个标准数据集上的实验结果表明,该方法在准确率上与基于学习的方法相当,同时在单个现代CPU上实现了每秒超过100万句的惊人处理速度。源代码可在 https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jagger/ 获取。