Many areas of the world are without basic information on the socioeconomic well-being of the residing population due to limitations in existing data collection methods. Overhead images obtained remotely, such as from satellite or aircraft, can help serve as windows into the state of life on the ground and help "fill in the gaps" where community information is sparse, with estimates at smaller geographic scales requiring higher resolution sensors. Concurrent with improved sensor resolutions, recent advancements in machine learning and computer vision have made it possible to quickly extract features from and detect patterns in image data, in the process correlating these features with other information. In this work, we explore how well two approaches, a supervised convolutional neural network and semi-supervised clustering based on bag-of-visual-words, estimate population density, median household income, and educational attainment of individual neighborhoods from publicly available high-resolution imagery of cities throughout the United States. Results and analyses indicate that features extracted from the imagery can accurately estimate the density (R$^2$ up to 0.81) of neighborhoods, with the supervised approach able to explain about half the variation in a population's income and education. In addition to the presented approaches serving as a basis for further geographic generalization, the novel semi-supervised approach provides a foundation for future work seeking to estimate fine-scale information from aerial imagery without the need for label data.
翻译:全球许多地区因现有数据收集方法的局限性,缺乏关于居民社会经济福祉的基本信息。通过卫星或飞行器等遥感方式获取的俯视影像,可成为观察地面生活状况的"窗口",并在社区信息匮乏的区域填补数据空白——更小地理尺度的估算需要更高分辨率的传感器。随着传感器分辨率的提升,机器学习与计算机视觉的最新进展使得从图像数据中快速提取特征并检测模式成为可能,进而将这些特征与其他信息建立关联。本研究探索了两种方法——监督式卷积神经网络与基于视觉词袋的半监督聚类——在利用美国各地城市公开高分辨率影像估算社区人口密度、家庭收入中位数及教育水平方面的表现。结果与分析表明,影像提取的特征可准确估算社区人口密度(R²最高达0.81),监督式方法能解释约半数人口收入与受教育程度的变异。除作为进一步地理泛化基础外,本文提出的新型半监督方法为未来无需标签数据即可从航拍影像中估算精细尺度信息的研究奠定了基础。