Estimating health indicators for restricted sub-populations is a recurring challenge in epidemiology and public health. When survey data are used, Small Area Estimation (SAE) methods can improve precision by borrowing strength across domains. In many applications, however, outcomes are self-reported and affected by coarsening mechanisms, such as rounding and digit preference, that reduce data resolution and may bias inference. This paper addresses both issues by developing a Bayesian unit-level SAE framework for semi-continuous, coarsened responses. Motivated by the 2019 Italian European Health Interview Survey, we estimate smoking indicators for domains defined by the cross-classification of Italian regions and age groups, capturing both smoking prevalence and intensity. The model adopts a two-part structure: a logistic component for smoking prevalence and a flexible mixture of Lognormal distributions for average cigarette consumption, coupled with an explicit model for coarsening and topcoding. Simulation studies show that ignoring coarsening can yield biased and unstable domain estimates with poor interval coverage, whereas the proposed model improves accuracy and achieves near-nominal coverage. The empirical application provides a detailed picture of smoking patterns across region-age domains, helping to characterize the dynamics of the phenomenon and inform targeted public health policies.
翻译:在流行病学和公共卫生领域,估计受限亚人群的健康指标是一个反复出现的挑战。当使用调查数据时,小区域估计(SAE)方法可以通过跨域借用强度来提高估计精度。然而,在许多应用中,结果是自我报告的,并受到粗化机制(如四舍五入和数字偏好)的影响,这些机制降低了数据分辨率,并可能导致推断偏差。本文通过为半连续、粗化的响应开发一个贝叶斯单元级SAE框架,同时解决了这两个问题。受2019年意大利欧洲健康访谈调查的启发,我们估计了由意大利大区和年龄组交叉分类定义的各区域的吸烟指标,同时捕捉了吸烟流行率和吸烟强度。该模型采用二部结构:一个用于吸烟流行率的逻辑斯蒂分量,以及一个用于平均香烟消费量的灵活的对数正态分布混合模型,并结合了一个明确的粗化和顶编码模型。模拟研究表明,忽略粗化可能导致有偏且不稳定的区域估计,其区间覆盖率较差,而所提出的模型提高了准确性并实现了接近名义水平的覆盖率。实证应用提供了跨区域-年龄域的吸烟模式的详细图景,有助于描述该现象的动力学特征,并为有针对性的公共卫生政策提供信息。