1. Citizen and community-science (CS) datasets have great potential for estimating interannual patterns of population change given the large volumes of data collected globally every year. Yet, the flexible protocols that enable many CS projects to collect large volumes of data typically lack the structure necessary to keep consistent sampling across years. This leads to interannual confounding, as changes to the observation process over time are confounded with changes in species population sizes. 2. Here we describe a novel modeling approach designed to estimate species population trends while controlling for the interannual confounding common in citizen science data. The approach is based on Double Machine Learning, a statistical framework that uses machine learning methods to estimate population change and the propensity scores used to adjust for confounding discovered in the data. Additionally, we develop a simulation method to identify and adjust for residual confounding missed by the propensity scores. Using this new method, we can produce spatially detailed trend estimates from citizen science data. 3. To illustrate the approach, we estimated species trends using data from the CS project eBird. We used a simulation study to assess the ability of the method to estimate spatially varying trends in the face of real-world confounding. Results showed that the trend estimates distinguished between spatially constant and spatially varying trends at a 27km resolution. There were low error rates on the estimated direction of population change (increasing/decreasing) and high correlations on the estimated magnitude. 4. The ability to estimate spatially explicit trends while accounting for confounding in citizen science data has the potential to fill important information gaps, helping to estimate population trends for species, regions, or seasons without rigorous monitoring data.
翻译:1. 公民与社区科学(CS)数据集因每年在全球范围内收集的大量数据,在估算种群年际变化模式方面具有巨大潜力。然而,使众多CS项目得以收集海量数据的灵活方案,往往缺乏维持跨年度一致采样所需的结构。这种结构缺失导致年际混杂效应——即观测过程随时间的变化与物种种群规模变化相互混杂。2. 本文描述了一种旨在控制公民科学数据中常见年际混杂效应的新型物种种群趋势估算方法。该方法基于双重机器学习——一种利用机器学习方法估算种群变化及倾向得分的统计框架,其中倾向得分用于调整数据中发现的混杂效应。此外,我们开发了一种模拟方法,用于识别并调整倾向得分未能发现的残留混杂效应。借助这一新方法,我们可从公民科学数据中生成空间精细化的趋势估算结果。3. 为阐释该方法,我们利用CS项目eBird的数据估算了物种趋势,并通过模拟研究评估了该方法在真实混杂情景下估算空间变异趋势的能力。结果表明,在27公里分辨率下,趋势估算能够区分空间恒定趋势与空间变异趋势。种群变化方向(增加/减少)的估算误差率较低,变化幅度的估算相关性较高。4. 在考虑公民科学数据混淆效应的前提下估算空间显式趋势的能力,有望填补重要信息空白,有助于为缺乏严格监测数据的物种、区域或季节估算种群趋势。