Much of the research in social computing analyzes data from social media platforms, which may inherently carry biases. An overlooked source of such bias is the over-representation of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, which might not accurately mirror the global demographic diversity. We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference; the only venue whose proceedings are fully dedicated to social computing research. We did so by analyzing 494 papers published from 2018 to 2022, which included full research papers, dataset papers and posters. After filtering out papers that analyze synthetic datasets or those lacking clear country of origin, we were left with 420 papers from which 188 participants in a crowdsourcing study with full manual validation extracted data for the WEIRD scores computation. This data was then used to adapt existing WEIRD metrics to be applicable for social media data. We found that 37% of these papers focused solely on data from Western countries. This percentage is significantly less than the percentages observed in research from CHI (76%) and FAccT (84%) conferences, suggesting a greater diversity of dataset origins within ICWSM. However, the studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich in comparison to those in FAccT, with a special note on the 'Democratic' variable reflecting political freedoms and rights. This points out the utility of social media data in shedding light on findings from countries with restricted political freedoms. Based on these insights, we recommend extensions of current "paper checklists" to include considerations about the WEIRD bias and call for the community to broaden research inclusivity by encouraging the use of diverse datasets from underrepresented regions.
翻译:社交计算领域的大量研究分析来自社交媒体平台的数据,这些数据可能固有地带有偏见。一个被忽视的偏见来源是WEIRD(西方、受过教育、工业化、富裕和民主)人群的过度代表,这可能无法准确反映全球人口多样性。我们评估了AAAI ICWSM会议上展示的研究对WEIRD人群的依赖程度;该会议是唯一一个论文集完全专注于社交计算研究的学术场合。我们通过分析2018年至2022年发表的494篇论文(包括完整研究论文、数据集论文和海报)来实现这一目标。在过滤掉分析合成数据集或缺乏明确来源国家的论文后,我们保留了420篇论文,并通过众包研究中188名参与者的全手动验证,从中提取了用于计算WEIRD分数的数据。这些数据随后被用于调整现有的WEIRD指标,使其适用于社交媒体数据。我们发现,这些论文中有37%仅关注来自西方国家的数据。这一比例显著低于CHI(76%)和FAccT(84%)会议研究中观察到的比例,表明ICWSM内部数据集来源具有更大的多样性。然而,与FAccT相比,ICWSM的研究仍然主要考察来自更受过教育、工业化和富裕国家的人群,特别值得注意的是“民主”变量反映了政治自由和权利。这指出了社交媒体数据在揭示政治自由受限国家的研究发现方面的效用。基于这些见解,我们建议扩展当前的“论文检查清单”,纳入对WEIRD偏见的考量,并呼吁学术界通过鼓励使用来自代表性不足地区的多样化数据集来扩大研究的包容性。