Multispectral airborne laser scanning for tree species classification: a benchmark of machine learning and deep learning algorithms

Josef Taher,Eric Hyyppä,Matti Hyyppä,Klaara Salolahti,Xiaowei Yu,Leena Matikainen,Antero Kukko,Matti Lehtomäki,Harri Kaartinen,Sopitta Thurachen,Paula Litkey,Ville Luoma,Markus Holopainen,Gefei Kong,Hongchao Fan,Petri Rönnholm,Matti Vaaja,Antti Polvivaara,Samuli Junttila,Mikko Vastaranta,Stefano Puliti,Rasmus Astrup,Joel Kostensalo,Mari Myllymäki,Maksymilian Kulicki,Krzysztof Stereńczak,Raul de Paula Pires,Ruben Valbuena,Juan Pedro Carbonell-Rivera,Jesús Torralba,Yi-Chen Chen,Lukas Winiwarter,Markus Hollaus,Gottfried Mandlburger,Narges Takhtkeshha,Fabio Remondino,Maciej Lisiewicz,Bartłomiej Kraszewski,Xinlian Liang,Jianchang Chen,Eero Ahokas,Kirsi Karila,Eugeniu Vezeteu,Petri Manninen,Roope Näsi,Heikki Hyyti,Siiri Pyykkönen,Peilun Hu,Juha Hyyppä

Climate-smart and biodiversity-preserving forestry demands precise information on forest resources, extending to the individual tree level. Multispectral airborne laser scanning (ALS) has shown promise in automated point cloud processing, but challenges remain in leveraging deep learning techniques and identifying rare tree species in class-imbalanced datasets. This study addresses these gaps by conducting a comprehensive benchmark of deep learning and traditional shallow machine learning methods for tree species classification. For the study, we collected high-density multispectral ALS data ($>1000$ $\mathrm{pts}/\mathrm{m}^2$) at three wavelengths using the FGI-developed HeliALS system, complemented by existing Optech Titan data (35 $\mathrm{pts}/\mathrm{m}^2$), to evaluate the species classification accuracy of various algorithms in a peri-urban study area located in southern Finland. We established a field reference dataset of 6326 segments across nine species using a newly developed browser-based crowdsourcing tool, which facilitated efficient data annotation. The ALS data, including a training dataset of 1065 segments, was shared with the scientific community to foster collaborative research and diverse algorithmic contributions. Based on 5261 test segments, our findings demonstrate that point-based deep learning methods, particularly a point transformer model, outperformed traditional machine learning and image-based deep learning approaches on high-density multispectral point clouds. For the high-density ALS dataset, a point transformer model provided the best performance reaching an overall (macro-average) accuracy of 87.9% (74.5%) with a training set of 1065 segments and 92.0% (85.1%) with a larger training set of 5000 segments.

翻译：气候智能型与生物多样性保护型林业需要精确到单株树木水平的森林资源信息。多光谱机载激光扫描（ALS）在自动化点云处理中展现出潜力，但在利用深度学习技术以及识别类别不平衡数据集中的稀有树种方面仍存在挑战。本研究通过开展深度学习与传统浅层机器学习方法在树种分类方面的全面基准测试，以应对这些不足。研究中，我们使用芬兰大地测量研究所开发的HeliALS系统采集了三个波长的高密度多光谱ALS数据（$>1000$ $\mathrm{点}/\mathrm{米}^2$），并辅以现有的Optech Titan数据（35 $\mathrm{点}/\mathrm{米}^2$），以评估位于芬兰南部城郊研究区内各种算法的树种分类精度。我们利用新开发的基于浏览器的众包工具，建立了一个包含九个树种共6326个分割样本的实地参考数据集，该工具促进了高效的数据标注。ALS数据（包括一个包含1065个分割样本的训练数据集）已与科学界共享，以促进合作研究及多样化的算法贡献。基于5261个测试分割样本，我们的研究结果表明，基于点的深度学习方法（尤其是一种点Transformer模型）在高密度多光谱点云上优于传统的机器学习及基于图像的深度学习方法。对于高密度ALS数据集，点Transformer模型在1065个分割样本训练集下取得了87.9%（宏观平均精度74.5%）的最佳总体精度，在5000个分割样本的更大训练集下精度达到92.0%（宏观平均精度85.1%）。