Data generated by users on digital platforms are a crucial resource for advocates and researchers interested in uncovering digital inequities, auditing algorithms, and understanding human behavior. Yet data access is often restricted. How can researchers both effectively and ethically collect user data? This paper shares an innovative approach to crowdsourcing user data to collect otherwise inaccessible Amazon purchase histories, spanning 5 years, from more than 5000 US users. We developed a data collection tool that prioritizes participant consent and includes an experimental study design. The design allows us to study multiple aspects of privacy perception and data sharing behavior. Experiment results (N=6325) reveal both monetary incentives and transparency can significantly increase data sharing. Age, race, education, and gender also played a role, where female and less-educated participants were more likely to share. Our study design enables a unique empirical evaluation of the "privacy paradox", where users claim to value their privacy more than they do in practice. We set up both real and hypothetical data sharing scenarios and find measurable similarities and differences in share rates across these contexts. For example, increasing monetary incentives had a 6 times higher impact on share rates in real scenarios. In addition, we study participants' opinions on how data should be used by various third parties, again finding demographics have a significant impact. Notably, the majority of participants disapproved of government agencies using purchase data yet the majority approved of use by researchers. Overall, our findings highlight the critical role that transparency, incentive design, and user demographics play in ethical data collection practices, and provide guidance for future researchers seeking to crowdsource user generated data.
翻译:用户生成的数据是数字平台上倡导者和研究人员揭露数字不平等、审计算法和理解人类行为的关键资源。然而,数据访问常常受到限制。研究人员如何既有效又合乎道德地收集用户数据?本文分享了一种创新方法,通过众包方式收集了5000多名美国用户长达5年的亚马逊购买历史数据(这些数据原本无法通过其他途径获取)。我们开发了一个优先考虑参与者知情同意且包含实验研究设计的数据收集工具。该设计使我们能够研究隐私感知和数据共享行为的多个方面。实验结果(N=6325)显示,金钱激励和透明度都能显著增加数据共享行为。年龄、种族、教育程度和性别也发挥作用,其中女性和受教育程度较低的参与者更倾向于共享数据。我们的研究设计能够对“隐私悖论”(用户声称重视隐私的程度超过实际表现)进行独到的实证评估。我们设置了真实和假设两种数据共享场景,并发现这些场景下的共享率存在可测量的相似性与差异。例如,在真实场景中,增加金钱激励对共享率的影响高出6倍。此外,我们研究了参与者对第三方如何利用数据的不同看法,再次发现人口统计学特征具有显著影响。值得注意的是,多数参与者不赞同政府机构使用购买数据,但多数人赞同研究人员使用。总体而言,我们的发现突显了透明度、激励设计和用户人口统计学特征在合乎道德的数据收集实践中的关键作用,并为未来寻求众包用户生成数据的研究人员提供了指导。