Handwritten document analysis is an area of forensic science, with the goal of establishing authorship of documents through examination of inherent characteristics. Law enforcement agencies use standard protocols based on manual processing of handwritten documents. This method is time-consuming, is often subjective in its evaluation, and is not replicable. To overcome these limitations, in this paper we present a framework capable of extracting and analyzing intrinsic measures of manuscript documents related to text line heights, space between words, and character sizes using image processing and deep learning techniques. The final feature vector for each document involved consists of the mean and standard deviation for every type of measure collected. By quantifying the Euclidean distance between the feature vectors of the documents to be compared, authorship can be discerned. We also proposed a new and challenging dataset consisting of 362 handwritten manuscripts written on paper and digital devices by 124 different people. Our study pioneered the comparison between traditionally handwritten documents and those produced with digital tools (e.g., tablets). Experimental results demonstrate the ability of our method to objectively determine authorship in different writing media, outperforming the state of the art.
翻译:手写文档分析是法医学的一个领域,旨在通过检查文档的内在特征来确定其作者身份。执法机构通常采用基于人工处理手写文档的标准流程。这种方法耗时、评估常带有主观性且不可重复。为克服这些局限,本文提出了一种框架,能够利用图像处理与深度学习技术提取并分析手写文档中与行高、词间距和字符尺寸相关的内在度量指标。每个待分析文档的最终特征向量由所采集各项度量的均值与标准差构成。通过量化待比较文档特征向量之间的欧氏距离,可判别作者身份。我们还提出了一个具有挑战性的新数据集,包含124名不同人员在纸质和数字设备上书写的362份手写文稿。本研究开创性地比较了传统手写文档与数字工具(如平板电脑)生成文档间的差异。实验结果表明,我们的方法能够客观判定不同书写媒介中的作者身份,且性能优于现有技术。