Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X chromosome is more complex due to sex-specific genotype structures and potential sex differences in minor allele frequency (sdMAF). Existing tests differ in their assumptions about sdMAF and male sample inclusion, often leading to distinct but poorly characterized null hypotheses. We develop a general statistical framework for HWE inference using the robust allele-based regression model. By formulating HWE testing as an assessment of allele-level dependence, the framework directly parameterizes Hardy-Weinberg disequilibrium, unifies existing Pearson chi-square-based tests under explicit modeling assumptions, and clarifies their null hypotheses, degrees of freedom, and sensitivity to sdMAF. The framework also accommodates covariate and population-structure adjustment within a unified regression-based formulation. The proposed framework provides robust, interpretable, and flexible inference, establishing a unified statistical foundation for HWE testing across autosomal and X-chromosomal regions. Simulation studies and analysis of high-coverage 1000 Genomes Project data demonstrate that commonly used X-chromosome tests can exhibit inflated type I error or misleading inference when sdMAF is present.
翻译:哈迪-温伯格平衡(HWE)检验是遗传数据分析的基本组成部分,广泛应用于质量控制和模型验证。尽管常染色体位点的HWE检验已成熟确立,但由于性别特异性基因型结构及小等位基因频率中潜在的性别差异(sdMAF),X染色体的推断更为复杂。现有检验在sdMAF假设和男性样本纳入方面存在差异,常导致不同但特征不明的零假设。我们基于稳健的等位基因回归模型,构建了一个适用于HWE推断的统一统计框架。通过将HWE检验转化为等位基因依赖性评估,该框架直接参数化哈迪-温伯格不平衡状态,在明确的建模假设下统一了现有基于皮尔逊卡方的检验,明确了其零假设、自由度及对sdMAF的敏感性。该框架还在统一回归模型的基础上兼容协变量与群体结构校正。所提出的框架提供了稳健、可解释且灵活的推断,为常染色体和X染色体区域的HWE检验建立了统一的统计基础。模拟研究与高覆盖率千人基因组计划数据分析表明,常用的X染色体检验在存在sdMAF时可能出现增大的I类错误或误导性推断。