We introduce MathWriting, the largest online handwritten mathematical expression dataset to date. It consists of 230k human-written samples and an additional 400k synthetic ones. MathWriting can also be used for offline HME recognition and is larger than all existing offline HME datasets like IM2LATEX-100K. We introduce a benchmark based on MathWriting data in order to advance research on both online and offline HME recognition.
翻译:我们提出了MathWriting,这是迄今为止规模最大的在线手写数学表达式数据集。该数据集包含23万个人工书写样本及40万个合成样本。MathWriting同样可应用于离线手写数学表达式识别领域,其规模超越现有全部离线数据集(如IM2LATEX-100K)。为推进在线与离线HME识别研究,我们基于MathWriting数据构建了一套基准测试体系。