National statistical offices (NSOs) produce their estimates under a single weighting system (uni-weight approach): one set of weights, independent of the variable of interest, is used to estimate multiple parameters and multiple subpopulations (domains). In this paper we study, within the family of model-assisted estimators and from a design-based perspective of direct estimation, the use of regression trees as the assisting model for estimating totals in unplanned domains. We distinguish two strategies: (i) fitting a single tree at the population level and deriving from it uni-weight weights applicable to any domain, and fitting a domain-specific tree. We show that both estimators can be written as weighted sums with weights that do not depend on $y$, preserving the uni-weight property and additivity benchmarking with respect to the population total. Extending to trees the classical result, we argue why the estimator built from a population-level model tends to behave like the Horvitz-Thompson estimator within domains, whereas the domain-specific model can achieve substantial variance reductions. A simulation study based on microdata from the Uruguayan Continuous Household Survey (ECH) illustrates the behavior of the estimators at the population level and by department
翻译:国家统计机构(NSOs)在统一加权系统(单一权重方法)下生成其估计量:即独立于目标变量的一组权重,用于估计多个参数和多个子总体(域)。本文在模型辅助估计量家族内,从直接估计的设计视角出发,研究将回归树作为辅助模型来估计非计划域总量的方法。我们区分两种策略:(i)在总体层面拟合单一树,并从中推导出适用于任意域的单一权重;以及(ii)拟合域特定树。我们证明这两种估计量均可表示为权重不依赖于$y$的加权和,从而保持单一权重属性及对总体总量的可加性基准调整。通过将经典结果推广至回归树,我们论证为何基于总体层面模型构建的估计量在域内倾向于表现如霍维茨-汤普森估计量,而域特定模型则能实现显著的方差缩减。基于乌拉圭连续住户调查(ECH)微观数据的模拟研究展示了这些估计量在总体层面及按部门划分的行为特征。