Addresses occupy a niche location within the landscape of textual data, due to the positional importance carried by every word, and the geographical scope it refers to. The task of matching addresses happens everyday and is present in various fields like mail redirection, entity resolution, etc. Our work defines, and formalizes a framework to generate matching and mismatching pairs of addresses in the English language, and use it to evaluate various methods to automatically perform address matching. These methods vary widely from distance based approaches to deep learning models. By studying the Precision, Recall and Accuracy metrics of these approaches, we obtain an understanding of the best suited method for this setting of the address matching task.
翻译:地址在文本数据领域中占据着独特的位置,这是因为其中每个单词都承载着位置的重要性,并指向特定的地理范围。地址匹配任务每天都在进行,并出现在邮件重定向、实体解析等多个领域。本文定义并形式化了一个用于生成英文地址匹配与非匹配对的框架,并利用该框架评估了多种自动执行地址匹配的方法。这些方法涵盖从基于距离的方法到深度学习模型等多种类型。通过研究这些方法的精确率、召回率和准确率指标,我们得以深入了解最适合地址匹配任务场景的方法。