Testing judicial impartiality is a problem of fundamental importance in empirical legal studies, for which standard regression methods have been popularly used to estimate the extralegal factor effects. However, those methods cannot handle control variables with ultrahigh dimensionality, such as found in judgment documents recorded in text format. To solve this problem, we develop a novel mixture conditional regression (MCR) approach, assuming that the whole sample can be classified into a number of latent classes. Within each latent class, a standard linear regression model can be used to model the relationship between the response and a key feature vector, which is assumed to be of a fixed dimension. Meanwhile, ultrahigh dimensional control variables are then used to determine the latent class membership, where a Na\"ive Bayes type model is used to describe the relationship. Hence, the dimension of control variables is allowed to be arbitrarily high. A novel expectation-maximization algorithm is developed for model estimation. Therefore, we are able to estimate the interested key parameters as efficiently as if the true class membership were known in advance. Simulation studies are presented to demonstrate the proposed MCR method. A real dataset of Chinese burglary offenses is analyzed for illustration purpose.
翻译:检验司法公正性是实证法律研究中的一个根本性问题,传统上常采用标准回归方法估计法外因素效应。然而,这些方法无法处理以文本格式记录的裁判文书等超高维控制变量。为解决该问题,本文提出了一种新颖的混合条件回归(MCR)方法,其假设全体样本可划分为若干潜在类别。在每个潜在类别内,可采用标准线性回归模型刻画响应变量与固定维度的关键特征向量之间的关系;同时,利用超高维控制变量基于朴素贝叶斯型模型确定潜在类别归属,从而允许控制变量维度任意高。为此,我们开发了一种新的期望最大化算法进行模型估计,使得所关注的关键参数估计效率可达到与预先知晓真实类别归属时相当的水平。通过仿真研究验证了所提MCR方法的有效性,并利用中国盗窃罪真实数据集进行了实证分析。