Modeling and estimation for spatial data are ubiquitous in real life, frequently appearing in weather forecasting, pollution detection, and agriculture. Spatial data analysis often involves processing datasets of enormous scale. In this work, we focus on large-scale internet-quality open datasets from Ookla. We look into estimating mobile (cellular) internet quality at the scale of a state in the United States. In particular, we aim to conduct estimation based on highly {\it imbalanced} data: Most of the samples are concentrated in limited areas, while very few are available in the rest, posing significant challenges to modeling efforts. We propose a new adaptive kernel regression approach that employs self-tuning kernels to alleviate the adverse effects of data imbalance in this problem. Through comparative experimentation on two distinct mobile network measurement datasets, we demonstrate that the proposed self-tuning kernel regression method produces more accurate predictions, with the potential to be applied in other applications.
翻译:空间数据的建模与估计在现实生活中无处不在,常见于天气预报、污染检测和农业等领域。空间数据分析通常涉及处理大规模数据集。本研究聚焦于Ookla提供的大规模互联网质量开放数据集,以美国州级尺度为对象探究移动(蜂窝)互联网质量估计问题。特别地,我们旨在基于高度不均衡数据进行估计:多数样本集中于有限区域,而其余区域样本稀少,这对建模工作构成重大挑战。我们提出一种新型自适应核回归方法,采用自调优核函数以缓解该问题中数据不均衡带来的不利影响。通过在两组不同移动网络测量数据集上的对比实验,我们证明所提出的自调优核回归方法能产生更精确的预测,并具有应用于其他领域的潜力。