We develop a flexible approach by combining the Quadtree-based method with suppression to maximize the utility of the grid data and simultaneously to reduce the risk of disclosing private information from individual units. To protect data confidentiality, we produce a high resolution grid from geo-reference data with a minimum size of 1 km nested in grids with increasingly larger resolution on the basis of statistical disclosure control methods (i.e threshold and concentration rule). While our implementation overcomes certain weaknesses of Quadtree-based method by accounting for irregularly distributed and relatively isolated marginal units, it also allows creating joint aggregation of several variables. The method is illustrated by relying on synthetic data of the Danish agricultural census 2020 for a set of key agricultural indicators, such as the number of agricultural holdings, the utilized agricultural area and the number of organic farms. We demonstrate the need to assess the reliability of indicators when using a sub-sample of synthetic data followed by an example that presents the same approach for generating a ratio (i.e., the share of organic farming). The methodology is provided as the open-source \textit{R}-package \textit{MRG} that is adaptable to use with other geo-referenced survey data underlying confidentiality or other privacy restrictions.
翻译:我们开发了一种灵活的方法,将基于四叉树的方法与抑制技术相结合,以最大化栅格数据的效用,同时降低泄露个体单元隐私信息的风险。为保护数据机密性,我们基于统计披露控制方法(即阈值规则与集中度规则),从地理参考数据中生成高分辨率栅格,其最小尺寸为1公里,并嵌套于分辨率逐级增大的栅格中。我们的实现不仅通过考虑不规则分布及相对孤立的边缘单元,克服了基于四叉树方法的某些弱点,还支持对多个变量进行联合聚合。该方法以丹麦2020年农业普查的合成数据为例进行说明,涵盖一系列关键农业指标,如农业经营单位数量、农业用地面积以及有机农场数量。我们论证了在使用合成数据子样本时评估指标可靠性的必要性,并随后通过一个生成比率(即有机农业占比)的示例展示了同一方法的应用。该方法以开源R包《MRG》的形式提供,可适配用于其他受保密性或隐私限制的地理参考调查数据。