Respondent-driven sampling (RDS) is widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. The success and efficiency of RDS can depend critically on the nature of the incentives, including their number, value, call to action, etc. Standard RDS uses an incentive structure that is set a priori and held fixed throughout the study. Thus, it does not make use of accumulating information on which incentives are effective and for whom. We propose a reinforcement learning (RL) based adaptive RDS study design in which the incentives are tailored over time to maximize cumulative utility during the study. We show that these designs are more efficient, cost-effective, and can generate new insights into the social structure of hidden populations. In addition, we develop methods for valid post-study inference which are non-trivial due to the adaptive sampling induced by RL as well as the complex dependencies among subjects due to latent (unobserved) social network structure. We provide asymptotic regret bounds and illustrate its finite sample behavior through a suite of simulation experiments.
翻译:受访者驱动抽样(RDS)通过激励研究参与者招募其社会关系,被广泛用于研究隐蔽或难以接触的群体。RDS的成功与效率关键取决于激励措施的性质,包括其数量、价值、行动号召力等。标准RDS采用预先设定并在整个研究期间保持固定的激励结构,因此无法利用关于哪些激励措施对何人有效的累积信息。我们提出一种基于强化学习(RL)的自适应RDS研究设计,其中激励措施随时间动态调整以最大化研究期间的累积效用。研究表明,该设计具有更高效率与成本效益,并能生成关于隐蔽群体社会结构的新见解。此外,我们开发了适用于研究后有效推断的方法——由于RL引发的自适应抽样以及潜在(未观测)社会网络结构导致被试间复杂的依赖关系,此类推断具有非平凡性。我们提供了渐近遗憾界,并通过一系列仿真实验说明了其有限样本表现。