Influence of different data types and dimension reduction on the recognition accuracy of travertine hyperspectral images
-
摘要: 钙华是研究地壳运动、古气候等地质环境的重要载体,大规模的钙华景观不仅有利于研究地质演变,作为自然遗产具有很高的旅游价值和保护意义,由于全球气候变化与人为因素影响,钙华容易出现被破坏、退化等现象。为方便保护和修复钙华资源,本研究提出区别于传统实地勘察的高光谱识别方法,利用原始数据(OD)、多元散射(MSC)后数据、一阶导后(FD)数据、二阶导(SD)后数据经过主成分分析(PCA)、线性判别分析(LDA)方法降维后与支持向量机(SVM)、随机森林(RF)、BP神经网络、卷积神经网络(CNN)四种方法建立识别模型,并讨论了不同降维效果和数据类型对识别模型总体分类精度(OA)的影响,发现原始数据中PCA降维的效果比LDA降维效果好,其在PCA降维下的分类模型普遍精度要比LDA下的模型精度高;在本研究中,以MSC数据为输入的识别模型精度均值为88%,在四种数据的模型精度均值大小中位居第二,仅比第一位低0.1%,但其方差与标准差分别为0.043、0.042,远远小于其他三种数据的模型,说明MSC数据的识别模型要更加稳定;其次经过粒子群算法(PSO)优化的SVM分类模型在F1-score、kappa系数、OA三种性能指标的评价下性能显示优良,其中SD-PCA-PSO-SVM获得了98%的高精度。综上,在钙华识别过程中,未经优化的分类器选择MSC数据或PCA降维的原始数据作为输入,更容易获取高精度识别模型,选择合适的理论来优化模型也可提升模型的识别性能。Abstract:
Travertine is a kind of travertine carbonate precipitate that is generated when huge quantities of carbon dioxide are released from the surface of the earth. The formation of a large-scale landscape from this type of precipitate often takes a considerable length of time. Therefore, the travertine landscape may be used as a significant carrier for the study of crustal movement, paleoclimate, and other geological settings. Furthermore, the large-scale travertine landscape, which is considered as a natural heritage, is significant for conservation with a high tourist value. This study focuses on the Huanglong Scenic Area in China, which is recognized as a global natural heritage site by the United Nations Educational, Scientific, and Cultural Organization (UNESCO). This area is renowned for its expansive surface travertine landscapes that include a wide variety of distinctive formations and vibrant colors. The travertine in Huanglong, on the other hand, has been experiencing major deterioration in recent years, such as blackening and algal erosion. Therefore, the recognition and monitoring of travertine is urgent. This study proposes a method of recognizing travertine based on hyperspectral reflectance data in order to facilitate the protection and restoration of travertine resources. This method can be used to effectively tackle the problems brought about by traditional field surveys that are time-consuming, labor-intensive and likely destructive to travertine landscapes. This study was conducted in the following procedure. Four types of data were taken as classification objects, that is, original data and other three types of data that were converted respectively by multiple scattering, first-order derivative and second-order inverse for the original data. Then, these four types of data were respectively reduced to their corresponding dimensions by Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), according to the magnitude of the cumulative variance of the data. Finally, for the classification of these four types of data after dimension reduction, they were respectively put into four kinds of classifiers, namely, Support Vector Machines (SVM), Random Forests (RF), BP Neural Networks, and Convolutional Neural Networks (CNN). Overall Classification Accuracy (OA) was used as an evaluation index. In addition, Particle Swarm Algorithm (PSO) was used to optimize the penalty coefficient C and the Gammer parameter values of SVM. Afterward, the optimized SVM was applied to develop a recognition model of classification. Moreover, three indicators, namely, F1-Score, Kappa coefficient, and OA were utilized to assess the performance of SVM recognition model. In terms of the data type and the method of dimension reduction, the classification results of the recognition model established in this study were studied. In the aspect of the method of data dimension reduction, it was discovered that dimension reduction of the original data by PCA was superior to that by LDA. Furthermore, the classification model of the original data by PCA dimension reduction was generally more accurate than that by LDA. With regard to the type of data, the mean value of the model accuracy with MSC data as input was 88%, which was the second largest among the four types of data, only 0.1% lower than the first one. However, its variance and standard deviation were 0.043 and 0.042, respectively, much smaller than those of the models with the other three types of data, which indicated that the recognition model with MSC data was much more stable. Finally, the SVM classification model that was optimized by PSO demonstrated its outstanding performance when evaluated from the three performance indexes: F1-score, kappa coefficient, and OA. In general, this performance is superior to that of the unoptimized SVM recognition model, with the SD-PCA-PSO-SVM model the best performance among the three. Values of F1-Score, Kappa and OA of the classification results by optimized SVM were 0.93, 0.92, and 0.98, respectively. In conclusion, it is easier for the unoptimized classifier to acquire a high-precision recognition model, if the MSC data or the original data processed by PCA dimension reduction were selected as the input in the recognition of travertine. Additionally, selecting an appropriate theory to optimize the model can also improve the recognition performance of the model. -
表 1 PSR-2500地物光谱仪的主要技术参数
Table 1. Main technical parameters of PSR-2500 ground object spectrometer
标称测量范围/nm 实际测量范围/nm 波长准确度/mm 光谱分辨率/nm 单样品采集数/个 350~ 2 500 334.3~ 2 535.9 5 ≤3.5(350~ 1 000 );
≤22(1 000 ~2 500 )10 表 2 四类数据主成分和总方差解释
Table 2. Main components of four types of data and total variance interpretation
类别 总方差解释 主成份 特征值 方差/% 累积/% OD 1 648.66 84.462 84.462 2 83.886 10.923 95.384 … 26 0.004 0.001 99.997 27 0.004 0.000 99.997 MSC 1 648.667 84.462 84.462 2 83.886 10.923 95.384 … 26 0.004 0.001 99.997 27 0.004 0.000 99.997 FD 1 357.702 46.637 46.637 2 81.224 10.590 57.226 … 133 0.004 0.001 99.993 134 0.004 0.000 99.993 SD 1 155.906 20.353 20.353 2 87.884 11.473 31.826 … 158 0.016 0.002 99.998 159 0.014 0.002 100.00 表 3 四种数据的两种降维方法的降维效果优劣
Table 3. Advantages and disadvantages of dimension reduction by two methods for four types of data
数据类型 PCA LDA OD √ FD √ MSC √ SD √ 表 4 四种数据经PCA、LDA降维后的数据分类精度对比
Table 4. Comparison of classification accuracy of four types of data after PCA and LDA dimension reduction
类别 SVM RF BP CNN linear poly rbf sigmoid PCA LDA PCA LDA PCA LDA PCA LDA PCA LDA PCA LDA PCA LDA OD 0.815 0.753 0.891 0.877 0.906 0.892 0.877 0.877 0.854 0.837 0.965 0.916 0.895 0.854 FD 0.877 0.646 0.892 0.877 0.877 0.877 0.908 0.877 0.796 0.857 0.895 0.895 0.937 0.959 MSC 0.815 0.877 0.908 0.877 0.908 0.892 0.877 0.877 0.816 0.837 0.958 0.958 0.875 0.854 SD 0.892 0.707 0.950 0.877 0.892 0.877 0.877 0.877 0.796 0.837 0.937 0.926 0.916 0.950 表 5 PSO-SVM的各个性能指标值
Table 5. Index values of the performance of PSO-SVM
模型类型 数据类型 TP TN FP FN OA F1_score Kappa PCA-PSO-SVM OD 5 38 2 3 0.90 0.67 0.61 MSC 5 40 2 1 0.94 0.77 0.73 FD 5 38 2 3 0.90 0.67 0.61 SD 7 40 0 1 0.98 0.93 0.92 LDA-PSO-SVM OD 6 39 1 2 0.94 0.80 0.76 MSC 5 40 2 1 0.94 0.77 0.73 SD 6 38 1 3 0.92 0.75 0.70 OD 6 39 1 2 0.94 0.80 0.76 -
[1] 牛新生, 郑绵平, 刘喜方, 齐路晶. 青藏高原钙华沉积属性特征及其地质意义[J]. 科技导报, 2017, 35(6):59-64.NIU Xinsheng, ZHENG Mianping, LIU Xifang, QI Lujing. Sedimentary property and the geological significance of travertines in Qinghai-Tibetan Plateau[J]. Science & Technology Review, 2017, 35(6): 59-64 [2] 刘晶晶, 毛毳, 刘兴瑀, 魏荷花, 权莲顺, 刘泽璇, 张文鑫, 赵冰, 张青. 钙华的形成环境与特征及在油气储集方向的探讨[J]. 沉积学报, 2021, 39(6):1425-1439.LIU Jingjing, MAO Cui, LIU Xingyu, WEI Hehua, QUAN Lianshun, LIU Zexuan, ZHANG Wenxin, ZHAO Bing, ZHANG Qing. Overview of the formation environment and characteristics of travertines and discussion on the direction of oil and gas reservoir[J]. Acta Sedimentologica Sinica, 2021, 39(6): 1425-1439 [3] 蒋忠诚, 代群威, 董发勤, 张强, 党政, 汪智军, 刘凡. 国内外钙华岩溶景观的研究进展与展望[J]. 中国岩溶, 2021, 40(1):4-10.JIANG Zhongcheng, DAI Qunwei, DONG Faqin, ZHANG Qiang, DANG Zheng, WANG Zhijun, LIU Fan. Review of research progress and prospect of tufa/travertine karst landscape at home and abroad[J]. Carsologica Sinica, 2021, 40(1): 4-10 [4] 李刚, 董发勤, 代群威, 党政, 赵玉莲. 黄龙钙华有机碳测定方法的对比研究[J]. 岩石矿物学杂志, 2018, 37(1):152-160. doi: 10.3969/j.issn.1000-6524.2018.01.013LI Gang, DONG Faqin, DAI Qunwei, DANG Zheng, ZHAO Yulian. Comparative study on the determination methods of organic carbon in Huanglong travertine[J]. Acta Petrologica et Mineralogica, 2018, 37(1): 152-160 doi: 10.3969/j.issn.1000-6524.2018.01.013 [5] Ricketts J W, Ma L, Wagler A E, Garcia V H. Global travertine deposition modulated by oscillations in climate[J]. Journal of Quaternary Science, 2019, 34(7): 558-568. doi: 10.1002/jqs.3144 [6] 杨涵, 陈谦, 王宝刚, 李文生, 李文志, 王炳策, 钱建平. 利用高光谱技术预测采前猕猴桃干物质含量的可行性试验[J]. 农业工程学报, 2022, 38(13):133-140. doi: 10.11975/j.issn.1002-6819.2022.13.015YANG Han, CHEN Qian, WANG Baogang, LI Wensheng, LI Wenzhi, WANG Bingce, QIAN Jianping. Feasibility of estimating the dry matter content of kiwifruits before being harvested using hyperspectral technology[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(13): 133-140 doi: 10.11975/j.issn.1002-6819.2022.13.015 [7] Cen Yi, Huang Ying, Hu Shunshi, Zhang Lifu, Zhang Jian. Early detection of bacterial wilt in tomato with portable hyperspectral spectrometer[J]. Remote Sensing, 2022, 14(12): 2882. [8] Li Hongda, Cui Jian, Zhang Xinle, Han Yongqi, Cao Liying. Dimensionality reduction and classification of hyperspectral remote sensing image feature extraction[J]. Remote Sensing, 2022, 14(18): 4579. [9] 张楠楠, 张晓, 王城坤, 李莉, 白铁成. 基于高光谱和连续投影算法的棉花LAI值估测[J/OL]. 农业机械学报: 1-8[2023-07-08]. http://kns.cnki.net/kcms/detail/11.1964.S.20220901.1615.030.html.ZHANG Nanna, ZHANG Xiao, WANG Chenkun, LI Li, BAI Tiecheng. Cotton LAI value estimation based on hyperspectral and continuous projection algorithm[J/OL]. Transactions of the Chinese Society for Agricultural Machinery: 1-8. [10] Xu Dandan, Zhang Dong, Shi Dan, Luan Zhaoqing. Automatic extraction of open water using imagery of landsat series[J]. Water, 2020, 12(7): 1928. doi: 10.3390/w12071928 [11] 李晶, 邓晓娟, 杨震, 刘乾龙, 王媛, 崔绿园. 基于时序多光谱影像的干旱草原区开采扰动信息提取方法[J]. 光谱学与光谱分析, 2019, 39(12):3788-3793.LI Jing, DENG Xiaojuan, YANG Zhen, LIU Qianlong, WANG Yuan, CUI Lyuyuan. A method of extracting mining disturbance in arid grassland based on time series multispectral images[J]. Spectroscopy and Spectral Analysis, 2019, 39(12): 3788-3793. [12] Yu Yinshan, Shao Mingzhen, Jiang Lingjie, Ke Yongbin, Wei Dandan, Zhang Dongyang, Jiang Mingxin, Yang Yudong. Quantitative analysis of multiple components based on support vector machine (SVM)[J]. Optik-International Journal for Light & Electron Optics, 2021, 237: 166759. doi: 10.1016/j.ijleo.2021.166759 [13] Diago Cisneros L. Corrigendum to "Unitarity and symmetries of the multicomponent scattering matrix"[Ann. Phys. 420 (2020) 168255(1–43)][J]. Annals of Physics, 2022, 437: 168729. doi: 10.1016/j.aop.2021.168729 [14] Romo Cárdenas G, Avilés Rodríguez G J, Sánchez López J D D, Cosio Leon M, Luque P A, Gomez Gutierrez C M, Nieto Hipolito J, Vazquez Briseno M, Navarro Cota C X. Nyquist-Shannon theorem application for Savitzky-Golay smoothing window size parameter determination in bio-optical signals[J]. Results in Physics, 2018, 11: 17-22. doi: 10.1016/j.rinp.2018.08.033 [15] Booker N K, Knights P, Gates J D, Clegg R E. Applying principal component analysis (PCA) to the selection of forensic analysis methodologies[J]. Engineering Failure Analysis, 2022, 132: 105937. doi: 10.1016/j.engfailanal.2021.105937 [16] 杨明莉, 范玉刚, 李宝芸. 基于LDA和ELM的高光谱图像降维与分类方法研究[J]. 电子测量与仪器学报, 2020, 34(5):190-196.YANG Mingli, FAN Yugang, LI Baoyun. Research on dimensionality reduction and classification ofhyperspectral images based on LDA and ELM[J]. Journal of Electronic Measurement and Instrumentation, 2020, 34(5): 190-196. [17] Mantas C J, Castellano J G, Moral García S, Abellán J. A comparison of random forest based algorithms: Random credal random forest versus oblique random forest[J]. Soft Computing, 2019, 23(21): 10739-10754. doi: 10.1007/s00500-018-3628-5 [18] Sun H. Prediction of building energy consumption based on BP neural network[J]. Wireless Communications and Mobile Computing, 2022, 2022: 1-10. [19] Yang S, Luo L, Tan B. Research on sports performance prediction based on BP neural network[J]. Mobile Information Systems, 2021, 2021: 1-8. [20] Román Gallego J, Pérez Delgado M, San Gregorio S V. Convolutional neural networks used to date photographs[J]. Electronics, 2022, 11(2): 227. doi: 10.3390/electronics11020227 [21] Xie Zhihuai, Guo Zhenhua, Qian Chengshan. Palmprint gender classification by convolutional neural network[J]. IET Computer Vision, 2018, 12(4): 476-483. doi: 10.1049/iet-cvi.2017.0475 [22] Sun Yuting, Ding Shifei, Zhang Zichen, Jia Weikuan. An improved grid search algorithm to optimize SVR for prediction[J]. Soft Computing, 2021, 25(7): 5633-5644. doi: 10.1007/s00500-020-05560-w [23] Wang Dongshu, Tan Dapei, Liu Lei. Particle swarm optimization algorithm: An overview[J]. Soft Computing, 2018, 22(2): 387-408. doi: 10.1007/s00500-016-2474-6 [24] 王铮, 符校, 杜凯旋, 刘纪平, 车向红. 深度学习支持下的地图图片典型地理目标检测[J]. 测绘通报, 2022(11):74-78.WANG Zheng, FU Xiao, DU Kaixuan, LIU Jiping, CHE Xianghong. Detection of typical geographic object in maps based on deep learning[J]. Bulletin of Surveying and Mapping, 2022(11): 74-78. [25] Heydarian M, Doyle T E, Samavi R. MLCM: Multi-label confusion matrix[J]. IEEE Access, 2022, 10: 19083-19095. doi: 10.1109/ACCESS.2022.3151048