This paper presents a novel application of Generative Adverserial Networks (GANs) to study visual aspects of social processes. I train a a StyleGAN2-model on a custom dataset of 14,564 images of London, sourced from Google Streetview taken in London. After training, I invert the images in the training set, finding points in the model's latent space that correspond to them, and compare results from three inversion techniques. I connect each data point with metadata from the Indices of Multiple Deprivation, describing income, health and environmental quality in the area where the photographs were taken. It is then possible to map which parts of the model's latent space encode visual features that are distinctive for health, income and environmental quality, and condition the synthesis of new images based on these factors. The synthetic images created reflect visual features of social processes that were previously unknown and difficult to study, describing recurring visual differences between deprived and privileged areas in London. GANs are known for their capability to produce a continuous range of images that exhibit visual differences. The paper tests how to exploit this ability through visual comparisons in still images as well as through an interactive website where users can guide image synthesis with sliders. Though conditioned synthesis has its limitations and the results are difficult to validate, the paper points to the potential for generative models to be repurposed to be parts of social scientific methods.
翻译:本文提出了一种生成对抗网络(GANs)在社会过程视觉层面研究中的创新应用方法。基于从伦敦谷歌街景采集的14,564张图像构建定制数据集,训练了一个StyleGAN2模型。训练完成后,对训练集图像进行逆向映射,在模型潜在空间中定位对应点,并比较了三种逆向映射技术的效果。我们将每个数据点与多重贫困指数元数据相关联,该指数描述了拍摄区域的收入、健康和环境质量状况。由此可映射出模型潜在空间中编码健康、收入和环境质量独特视觉特征的区域,进而基于这些因素调控新图像的合成。生成的合成图像反映了此前未知且难以研究的社会过程视觉特征,揭示了伦敦贫困与富裕区域间反复出现的视觉差异。GANs因其生成连续视觉差异图像的能力而闻名,本文通过静态图像的视觉比较以及交互式网站(用户可通过滑块引导图像合成)两种方式验证了这种能力的应用。尽管条件合成存在局限性且结果难以验证,但本文揭示了生成模型在社会学方法创新中的潜在应用价值。