We set out the novel bottom up procedure to aggregate or cluster cells with small frequency counts together, in a two way classification while maintaining dependence in the table. The procedure is model free. It combines cells in a table into clusters based on independent log odds ratios. We use this procedure to build a set of statistically efficient and robust imputation cells, for the imputation of missing values of a continuous variable using a pair classification variables. A nice feature of the procedure is it forms aggregation groups homogeneous with respect to the cell response mean. Using a series of simulation studies, we show IlocA only groups together independent cells and does so in a consistent and credible way. While imputing missing data, we show IlocAs generates close to an optimal number of imputation cells. For ignorable non-response the resulting imputed means are accurate in general. With non-ignorable missingness results are consistent with those obtained elsewhere. We close with a case study applying our method to imputing missing building energy performance data
翻译:我们提出了一种新颖的自底向上程序,用于在保持二维分类表中依赖关系的同时,将频数较小的单元格聚合或聚类。该程序无需模型,基于独立对数优势比将表中的单元格组合成簇。我们利用此程序构建一套统计上高效且稳健的插值单元格,用于基于一对分类变量对连续变量的缺失值进行插补。该程序的一个显著特点是,它能够形成在单元格响应均值上同质的聚合组。通过一系列模拟研究,我们证明IlocA仅将独立单元格分组,且分组方式一致且可靠。在插补缺失数据时,IlocA生成的插值单元格数量接近最优。对于可忽略的无响应情况,所得的插补均值整体准确;对于非可忽略的缺失情况,结果与已有研究一致。最后,我们通过一个案例研究,将我们的方法应用于建筑能耗性能缺失数据的插补。