Extraction of land use information in karst areas based on Sentinel-2 images
-
摘要: 针对喀斯特地区分类尺度难以确定,特征数量维数过高,分类精度较低的问题,文章提出了通过联合评价确定最优分割尺度、 ReliefF算法对先验特征数据集进行优选,使用分层掩膜的策略,利用随机森林算法完成分类的方法。并以贵阳西南喀斯特地区为研究区,首先使用同质性与Moran's I联合评价的方法确定最优分割尺度为80,通过ReliefF算法优选出重要性较高的15个特征;在此基础上,通过对比试验验证了随机森林算法的优越性;以2020年Sentinel-2影像为实验数据设计3种面向对象分类方案。结果表明,经最优尺度计算、特征优选和分层掩膜的分类方法结果精度最高,分类总体精度、Kappa系数、AD、QD分别达到0.886、0.849、0.092、0.022。最后将该方法应用于2023年影像,分类总体精度、Kappa系数、AD、QD分别达到0.868、0.825、0.106、0.026。证明了该方法在喀斯特地区土地利用信息提取方面的优越性和适用性。Abstract:
Accurate land use information is the foundation of land management. Remote sensing data, characterized by its ease of acquisition, low cost, and high efficiency, has been widely used by scholars at home and abroad in the research of land use classification in combination with machine learning algorithms. Karst landforms are widely distributed in Southwest China. This region is fragile in ecological environment, due to its rugged terrain, large surface undulations, fragmented distribution of land plots. In addition, because of the long-term influence of its topography, the level of land use in the region is relatively low, and its economic development remains sluggish. Although extraction of accurate land use information is crucial for land resource management and planning in karst areas, the complex terrain and fragmented distribution of land plots in karst areas pose challenges for the extraction. Therefore, building on previous research, this study selected the southwestern part of Guiyang City, Guizhou Province—an area with karst landform characterized by complex terrain distribution and fragmented land plots—as a study area. With the use of Sentinel-2 satellite imagery as the basic data, the optimal object-oriented segmentation scale was calculated. The ReliefF algorithm was utilized to select features to input into the random forest algorithm, and land covers obtained from remote sensing images in different years were classified based on stratified classification. This study proposed a method that determined the optimal segmentation scale through joint evaluation, selected features from the prior feature dataset by random forest algorithm, and carried out classification with the use of ReliefF algorithm and a stratified masking strategy. Firstly, the optimal segmentation scale was determined as 80 by a joint evaluation with the combination of homogeneity and Moran's I. Subsequently, the ReliefF algorithm was employed to rank the importance of the initial features, with the top 15 significant features being selected. On this basis, the superiority of the random forest algorithm was verified by comparing multiple machine learning algorithms. Then, taking the Sentinel-2 images in 2022 as experimental data, this study designed three schemes for object-oriented classification to validate the superiority of the method with the combination of optimal segmentation scale, feature selection, and stratified masking in land use information extraction in karst areas. With the same samples, Model A completed the experiment by random forest algorithm without feature selection, and selected all 25 user-defined features, spectral features, shape features, and texture features. Model B selected the top 15 features after feature selection by ReliefF algorithm, and completed the random forest classification. Model C selected the top 15 features after ReliefF algorithm optimization and employed stratified masking under the random forest algorithm. In the principle of starting from the easiest area, non-vegetation areas were classified first. After the already classified areas had been ruled out by masking, the vegetation areas were classified, and finally, the classification results are merged. This method was then applied to the images in 2023 to verify its applicability in the extraction of land use information in karst areas. Through experimental research, the following conclusions can be drawn. Using the ReliefF algorithm to optimize the 25 classification features of remote sensing images can effectively improve classification accuracy and efficiency when training/validation samples are the same. In this study, the overall accuracy of Model B after feature selection was improved by 6.2%, compared to that of Model A with the original feature dataset, and Kappa was improved by 0.081. Multi-scale segmentation is the foundation of object-oriented classification and can avoid the "salt and pepper phenomenon". The evaluation of segmentation quality with the use of homogeneity and heterogeneity indices indicate that the optimal segmentation scale is 80. This approach can minimize the subjectivity of manual visual inspection. The random forest algorithm is superior in extracting land use information on different types of regions. Combining this algorithm with stratified masking can further reduce interference from already classified features to unclassified ones. The study demonstrates that the stratified masking method can achieve 88.6% of accuracy, the highest. -
Key words:
- karst area /
- random forest /
- ReliefF /
- multi-scale segmentation /
- stratification strategy
-
表 1 Sentinel-2 MSI数据介绍
Table 1. Introduction of Sentinel-2 MSI data
波段 空间分辨率/m 波长/μm B2 10 0.458~0.523 B3 10 0.543~0.578 B4 10 0.650~0.680 B8 10 0.785~0.900 表 2 初始特征
Table 2. Initial characteristics
特征类别 特征名称 数量 自定义特征 NDVI、BAI、NDWI、RDNI、RVI、SAVI 6 光谱特征 Mean_R、Mean_G、Mean_B、Mean_NIR、Standard_R、Standard_G、Standard_B、
Standard_NIR、Brigthtness、Max.diff10 形状特征 Area、Length/Width、Shape_ index 3 纹理特征 GLCM_ Homogeneity、GLCM_ Entropy、GLCM _Correlation、GLCM_Contrast、
GLCM_Mean、GLCM_StdDev6 表 3 影像分割质量评价
Table 3. Evaluation of image segmentation quality
尺度参数 $ G{S_R} $ $ G{S_G} $ $ G{S_B} $ $ G{S_{NIR}} $ 分割质量评价 20 1 1 1 1 1 40 0.7068 0.6702 0.6690 0.6569 0.6757 60 0.6434 0.5955 0.6044 0.6154 0.6147 70 0.6637 0.6187 0.6273 0.6524 0.6405 80 0.5639 0.5134 0.5161 0.5345 0.5320 90 0.6300 0.5815 0.5880 0.5754 0.5937 100 0.8720 0.8435 0.8489 0.8057 0.8425 120 1.0196 1.0115 1.0138 0.9124 0.9893 140 1.0484 1.0484 1.0537 0.9804 1.0327 160 1.1200 1.1240 1.1266 0.9947 1.0913 180 0.9249 0.9271 0.9365 0.9718 0.9401 200 0.9071 0.9094 0.9115 0.9479 0.9190 220 1.0225 1.0176 1.0109 1.0400 1.0228 240 1.0684 1.0538 1.0420 1.1715 1.0839 表 4 算法对比
Table 4. Comparison of algorithm
算法 OA/% AD/% QD/% Kappa系数 K-Nearest Neighbor 0.574 0.286 0.142 0.432 Decision Tree 0.718 0.222 0.060 0.630 Random Forest 0.876 0.092 0.032 0.836 表 5 模型精度对比
Table 5. Comparison of model accuracy
模型 OA/% AD/% QD/% Kappa系数 模型A 0.824 0.136 0.040 0.768 模型B 0.876 0.092 0.032 0.836 模型C 0.886 0.092 0.022 0.849 模型C2023 0.868 0.106 0.026 0.825 -
[1] Otukei J R, Blaschke T. Land cover change assessment using decision trees, support vector machines and maximum likelihood classification algorithms[J]. International Journal of Applied Earth Observation and Geoinformation, 2010, 12(Suppl.1): S27-S31. [2] Shao Y, Lunetta R S. Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points[J]. ISPRS Journal of Photogrammetry & Remote Sensing, 2012, 70: 78-87. [3] Dihkan M, Guneroglu N, Karsli F, Guneroglu A. Remote sensing of tea plantations using an SVM classifier and pattern-based accuracy assessment technique[J]. International Journal of Remote Sensing, 2013, 34(23): 8549-8565. [4] 张卫春, 刘洪斌, 武伟. 基于随机森林和Sentinel-2影像数据的低山丘陵区土地利用分类:以重庆市江津区李市镇为例[J]. 长江流域资源与环境, 2019, 28(6):1334-1343.ZHANG Weichun, LIU Hongbin, WU Wei. Classification of land use in low mountain and hilly area based on random forest and Sentinel-2 satellite data: A case study of Lishi town, Jiangjin, Chongqing[J]. Resources and Environment in the Yangtze Basin, 2019, 28(6): 1334-1343. [5] Yin Leikun, You Nanshan, Zhang Geli, Huang Jianxi, Dong Jinwei. Optimizing feature selection of individual crop types for improved crop mapping[J]. Remote Sensing, 2020, 12(1): 162. [6] 刘春亭, 冯权泷, 金鼎坚, 史同广, 刘建涛, 朱明水. 随机森林协同Sentinel-1/2的东营市不透水层信息提取[J]. 自然资源遥感, 2021, 33(3):253-261.LIU Chunting, FENG Quanlong, JIN Dingjian, SHI Tongguang, LIU Jiantao, ZHU Mingshui. Application of random forest and Sentinel-1/2 in the information extraction of impervious layers in Dongying City[J]. Remote Sensing for Natural Resources, 2021, 33(3): 253-261. [7] 姚杰鹏, 杨磊库, 陈探, 宋春桥. 基于Sentinel-1, 2和Landsat 8时序影像的鄱阳湖湿地连续变化监测研究[J]. 遥感技术与应用, 2021, 36(4):760-776.YAO Jiepeng, YANG Leiku, CHEN Tan, SONG Chunqiao. Consecutive monitoring of the Poyang lake wetland by integrating Sentinel-2 with Sentinel-1 and Landsat 8 data[J]. Remote Sensing Technology and Application, 2021, 36(4): 760-776. [8] Rodriguez Galiano V F, Ghimire B, Rogan J, Chica Olmo M, Rigol Sanchez J P. An assessment of the effectiveness of a random forest classifier for land-cover classification[J]. ISPRS Journal of Photogrammetry & Remote Sensing, 2012, 67: 93-104. [9] Cracknell M J, Reading A M. Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information[J]. Computers & Geosciences, 2014, 63: 22-33. [10] 李明洁, 王明常, 王凤艳, 陈学业, 丁文. 多特征随机森林的城市土地利用分类[J/OL]. 测绘科学: 1-8. http://kns.cnki.net/kcms/detail/11.4415.P.20210923.0819.004.html.LI Mingjie, WANG Mingchang, WANG Fengyan, CHEN Xueye, DING Wen. Urban land use classification based on multi-feature random forest[J/OL]. Science of Surveying and Mapping: 1-8. [11] 陈彦四, 黄春林, 侯金亮, 韩伟孝, 冯娅娅, 李翔华, 王静. 基于多时相Sentinel-2影像的黑河中游玉米种植面积提取研究[J]. 遥感技术与应用, 2021, 36(2):324-331.CHEN Yansi, HUANG Chunlin, HOU Jinliang, HAN Weixiao, FENG Yaya, LI Xianghua, WANG Jing. Extraction of maize planting area based on multi-temporal Sentinel-2 imagery in the middle reaches of Heihe river[J]. Remote Sensing Technology and Application, 2021, 36(2): 324-331 [12] 刘曙光, 董行, 娄厦, Dorzhievna Radnaeva Larisa, Nikitina Elena. 基于随机森林特征变量优化的湿地植物分类与密度反演[J]. 同济大学学报(自然科学版), 2021, 49(5):695-704. doi: 10.11908/j.issn.0253-374x.20498LIU Shuguang, DONG Hang, LOU Sha, Dorzhievna Radnaeva Larisa, Nikitina Elena. Classification and density inversion of wetland vegetation based on the feature variables optimization of random forest model[J]. Journal of Tongji University (Natural Science Edition), 2021, 49(5): 695-704. doi: 10.11908/j.issn.0253-374x.20498 [13] 何云, 黄翀, 李贺, 刘庆生, 刘高焕, 周振超, 张晨晨. 基于Sentinel-2A影像特征优选的随机森林土地覆盖分类[J]. 资源科学, 2019, 41(5):992-1001.HE Yun, HUANG Chong, LI He, LIU Qingsheng, LIU Gaohuan, ZHOU Zhenchao, ZHANG Chenchen. Land-cover classification of random forest based on Sentinel-2A image feature optimization[J]. Resources Science, 2019, 41(5): 992-1001. [14] Breiman. Random forests[J]. Mach Learn, 2001, 45(1): 5-32. [15] Kira K. The feature selection problem: Traditional methods and a new algorithm[J]. AAAI-92 Proceedings, 1992. [16] 张东彦, 杨玉莹, 黄林生, 杨琦, 梁栋, 佘宝, 洪琪, 姜飞. 结合Sentinel-2影像和特征优选模型提取大豆种植区[J]. 农业工程学报, 2021, 37(9):110-119. doi: 10.11975/j.issn.1002-6819.2021.09.013ZHANG Dongyan, YANG Yuying, HUANG Linsheng, YANG Qi, LIANG Dong, SHE Bao, HONG Qi, JIANG Fei. Extraction of soybean planting areas combining Sentinel-2 images and optimized feature model[J]. Transactions of the Chinese Society of Agricultural Engineering, 2021, 37(9): 110-119. doi: 10.11975/j.issn.1002-6819.2021.09.013 [17] 肖艳, 王斌. 基于面向对象的极化雷达影像分类[J]. 红外与毫米波学报, 2020, 39(4):505-512. doi: 10.11972/j.issn.1001-9014.2020.04.015XIAO Yan, WANG Bin. PolSAR image classification based on object-oriented technology[J]. Journal of Infrared and Millimeter Waves, 2020, 39(4): 505-512. doi: 10.11972/j.issn.1001-9014.2020.04.015 [18] Hay G J, Blaschke T, Marceau D J, Bouchard A. A comparison of three image-object methods for the multiscale analysis of landscape structure[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2003, 57: 327-345. [19] 刘兆祎, 李鑫慧, 沈润平, 朱枫, 张凯, 王恬, 王媛媛. 高分辨率遥感图像分割的最优尺度选择[J]. 计算机工程与应用, 2014, 50(6):144-147. doi: 10.3778/j.issn.1002-8331.1206-0094LIU Zhaoyi, LI Xinhui, SHEN Runping, ZHU Feng, ZHANG Kai, WANG Tian, WANG Yuanyuan. Selection of the best segmentation scale in high-resolution image segmentation[J]. Computer Engineering and Applications, 2014, 50(6): 144-147. doi: 10.3778/j.issn.1002-8331.1206-0094 [20] 王宏胜, 李永树, 吴玺, 李政. 结合空间分析的面向对象无人机影像土地利用分类[J]. 测绘工程, 2018, 27(2):57-61.WANG Hongsheng, LI Yongshu, WU Xi, LI Zheng. Object-oriented land use classification from UAV imagery with spatial analysis[J]. Engineering of Surveying and Mapping, 2018, 27(2): 57-61. [21] Pontius R G, M Millones. Death to Kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment[J]. International Journal of Remote Sensing, 2011, 32(15): 4407-4429.