The aim of this study is to look at predicting whether a person will complete a drug and alcohol rehabilitation program and the number of times a person attends. The study is based on demographic data obtained from Substance Abuse and Mental Health Services Administration (SAMHSA) from both admissions and discharge data from drug and alcohol rehabilitation centers in Oklahoma. Demographic data is highly categorical which led to binary encoding being used and various fairness measures being utilized to mitigate bias of nine demographic variables. Kernel methods such as linear, polynomial, sigmoid, and radial basis functions were compared using support vector machines at various parameter ranges to find the optimal values. These were then compared to methods such as decision trees, random forests, and neural networks. Synthetic Minority Oversampling Technique Nominal (SMOTEN) for categorical data was used to balance the data with imputation for missing data. The nine bias variables were then intersectionalized to mitigate bias and the dual and triple interactions were integrated to use the probabilities to look at worst case ratio fairness mitigation. Disparate Impact, Statistical Parity difference, Conditional Statistical Parity Ratio, Demographic Parity, Demographic Parity Ratio, Equalized Odds, Equalized Odds Ratio, Equal Opportunity, and Equalized Opportunity Ratio were all explored at both the binary and multiclass scenarios.
翻译:本研究旨在预测个体能否完成药物与酒精康复计划及其参与次数。研究基于从美国药物滥用与精神健康服务管理局(SAMHSA)获取的俄克拉荷马州药物与酒精康复中心入院与出院人口统计学数据。由于人口统计学数据高度分类化,研究采用二元编码处理,并运用多种公平性度量方法以减轻九个人口学变量的偏差。通过支持向量机在不同参数范围内比较线性、多项式、Sigmoid和径向基函数等核方法以寻找最优参数,并将其与决策树、随机森林和神经网络等方法进行对比。针对分类数据,采用合成少数类过采样技术名义型(SMOTEN)方法平衡数据,并对缺失数据进行插补。为减轻偏差,对九个偏差变量进行交叉化处理,整合双变量与三变量交互作用,利用概率分析最坏情况比率公平性缓解策略。在二元分类与多分类场景下,分别研究了不均衡影响、统计均等差异、条件统计均等比、人口均等、人口均等比、均等化几率、均等化几率比、均等机会及均等机会比等公平性指标。