Many population surveys do not provide information on respondents' residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography can be beneficial for characterizing neighborhoods, especially for relatively rare populations such as immigrants. One way to obtain such information is to link survey records to records in auxiliary databases that include residential addresses by matching on variables common to both files. In this research note, we present an approach based on probabilistic record linkage that enables matching survey participants in the Chinese Immigrants in Raleigh-Durham (ChIRDU) Study to records from InfoUSA, an information provider of residential records. The two files use different Chinese name romanization practices, which we address through a novel and generalizable strategy for constructing records' pairwise comparison vectors for romanized names. Using a fully Bayesian record linkage model, we characterize the geospatial distribution of Chinese immigrants in the Raleigh-Durham area.
翻译:许多人口调查未提供受访者住宅地址信息,仅包含邮政编码等粗略地理层级或更高层级的聚合数据。然而,精细分辨率的地理数据有助于刻画社区特征,尤其对于移民等相对稀有群体。获取此类信息的一种方法是通过匹配两个文件中共同的变量,将调查记录与包含住宅地址的辅助数据库记录进行链接。在本研究笔记中,我们提出一种基于概率记录链接的方法,能够将罗利-达勒姆华人移民(ChIRDU)研究中的调查参与者与居民记录信息提供商InfoUSA的记录进行匹配。两个文件采用不同的中文姓名罗马化惯例,我们通过一种新颖且可推广的策略构建罗马化姓名的成对比较向量,以解决这一问题。利用全贝叶斯记录链接模型,我们刻画了罗利-达勒姆地区华人移民的地理空间分布特征。