The inclusion of human sex and gender data in statistical analysis invokes multiple considerations for data collection, combination, analysis, and interpretation. These considerations are not unique to variables representing sex and gender. However, considering the relevance of the ethical practice standards for statistics and data science to sex and gender variables is timely, with results that can be applied to other sociocultural variables. Historically, human gender and sex have been categorized with a binary system. This tradition persists mainly because it is easy, and not because it produces the best scientific information. Binary classification simplifies combinations of older and newer data sets. However, this classification system eliminates the ability for respondents to articulate their gender identity, conflates gender and sex, and also obscures potentially important differences by collapsing across valid and authentic categories. This approach perpetuates historical inaccuracy, simplicity, and bias, while also limiting the information that emerges from analyses of human data. The approach also violates multiple elements in the American Statistical Association (ASA) Ethical Guidelines for Statistical Practice. Information that would be captured with a nonbinary classification could be relevant to decisions about analysis methods and to decisions based on otherwise expert statistical work. Statistical practitioners are increasingly concerned with inconsistent, uninformative, and even unethical data collection and analysis practices. This paper presents a historical introduction to the collection and analysis of human gender and sex data, offers a critique of a few common survey questioning methods based on alignment with the ASA Ethical Guidelines, and considers the scope of ethical considerations for human gender and sex data from design through analysis and interpretation.
翻译:在统计分析中纳入人类性别与性数据,涉及数据收集、整合、分析与解释的多重考量。这些考量并非仅适用于代表性别和性的变量。然而,针对统计学和数据科学的伦理实践标准与性别及性变量的相关性进行审视具有现实意义,其成果可推广至其他社会文化变量。历史上,人类性别与性一直被二元系统分类。这一传统之所以延续,主要因其简便性,而非因其能产出最佳科学信息。二元分类简化了新旧数据集的组合过程。然而,这种分类系统剥夺了受访者表达性别认同的能力,混淆了性别与性的概念,并通过合并有效且真实的类别掩盖了可能重要的差异。这种方法延续了历史的不准确性、简化性和偏见性,同时限制了从人类数据分析中获取的信息。此外,该方法还违反了美国统计协会(ASA)统计实践伦理指南的多项条款。通过非二元分类捕获的信息,可能对分析方法的决策以及基于专业统计工作的决策具有相关性。统计从业者日益关注不一致、无信息甚至不道德的数据收集与分析方法。本文首先介绍人类性别与性数据收集与分析的历史背景,基于ASA伦理指南对几种常见调查提问方法进行批判性分析,并探讨从设计到分析及解释全过程中涉及人类性别与性数据的伦理考量范围。