不同数据类型和降维对钙华高光谱识别精度的影响

徐梦辉; 王卫红; 田硕娟; 訾应昆; 吴周航; 王晓梦; 向红瑶; 范静

doi:10.11932/karst20240305

不同数据类型和降维对钙华高光谱识别精度的影响

doi: 10.11932/karst20240305

徐梦辉^{1, 2,},
王卫红^{1, 2, 3},
田硕娟¹,
訾应昆¹,
吴周航^{1, 2},
王晓梦¹,
向红瑶⁴,
范静²

1.
西南科技大学环境与资源学院, 四川绵阳 621000
2.
国家遥感中心绵阳科技城分部, 四川绵阳 621000
3.
西南科大四川天府新区创新研究院, 四川成都 610299
4.
西南科技大学土木工程与建筑学院，四川绵阳 621000

基金项目: 国家自然科学基金—区域创新发展联合基金（U21A2016）

详细信息

作者简介:
徐梦辉(1999－)，男，硕士研究生，研究方向：高光谱遥感地物识别。Email：xmhyyds1999@163.com

中图分类号: P237；P931
计量
- 文章访问数: 165
- HTML浏览量: 153
- PDF下载量: 52
- 被引次数: 0
出版历程
- 收稿日期: 2023-06-08
- 录用日期: 2023-07-31
- 修回日期: 2023-07-23
- 网络出版日期: 2024-08-15
- 刊出日期: 2024-08-30

Influence of different data types and dimension reduction on the recognition accuracy of travertine hyperspectral images

1.
School of Environment and Resource, Southwest University of Science and Technology, Mianyang, Sichuan 621000, China
2.
Mianyang S & T City Division, the National Remote Sensing Center of China, Mianyang, Sichuan 621000, China
3.
Tianfu Institute of Research and Innovation, Southwest University of Science and Technology, Chengdu, Sichuan 610299, China
4.
School of Civil Engineering and Architecture, Southwest University of Science and Technology，Mianyang, Sichuan 621000，China

摘要

摘要: 钙华是研究地壳运动、古气候等地质环境的重要载体，大规模的钙华景观不仅有利于研究地质演变，作为自然遗产具有很高的旅游价值和保护意义，由于全球气候变化与人为因素影响，钙华容易出现被破坏、退化等现象。为方便保护和修复钙华资源，本研究提出区别于传统实地勘察的高光谱识别方法，利用原始数据（OD）、多元散射（MSC）后数据、一阶导后（FD）数据、二阶导（SD）后数据经过主成分分析（PCA）、线性判别分析（LDA）方法降维后与支持向量机（SVM）、随机森林（RF）、BP神经网络、卷积神经网络（CNN）四种方法建立识别模型，并讨论了不同降维效果和数据类型对识别模型总体分类精度（OA）的影响，发现原始数据中PCA降维的效果比LDA降维效果好，其在PCA降维下的分类模型普遍精度要比LDA下的模型精度高；在本研究中，以MSC数据为输入的识别模型精度均值为88%，在四种数据的模型精度均值大小中位居第二，仅比第一位低0.1%，但其方差与标准差分别为0.043、0.042，远远小于其他三种数据的模型，说明MSC数据的识别模型要更加稳定；其次经过粒子群算法（PSO）优化的SVM分类模型在F1-score、kappa系数、OA三种性能指标的评价下性能显示优良，其中SD-PCA-PSO-SVM获得了98%的高精度。综上，在钙华识别过程中，未经优化的分类器选择MSC数据或PCA降维的原始数据作为输入，更容易获取高精度识别模型，选择合适的理论来优化模型也可提升模型的识别性能。
- 钙华 /
- 高光谱 /
- 数据降维与变换 /
- 粒子群算法 /
- 支持向量机
Abstract: Travertine is a kind of travertine carbonate precipitate that is generated when huge quantities of carbon dioxide are released from the surface of the earth. The formation of a large-scale landscape from this type of precipitate often takes a considerable length of time. Therefore, the travertine landscape may be used as a significant carrier for the study of crustal movement, paleoclimate, and other geological settings. Furthermore, the large-scale travertine landscape, which is considered as a natural heritage, is significant for conservation with a high tourist value. This study focuses on the Huanglong Scenic Area in China, which is recognized as a global natural heritage site by the United Nations Educational, Scientific, and Cultural Organization (UNESCO). This area is renowned for its expansive surface travertine landscapes that include a wide variety of distinctive formations and vibrant colors. The travertine in Huanglong, on the other hand, has been experiencing major deterioration in recent years, such as blackening and algal erosion. Therefore, the recognition and monitoring of travertine is urgent. This study proposes a method of recognizing travertine based on hyperspectral reflectance data in order to facilitate the protection and restoration of travertine resources. This method can be used to effectively tackle the problems brought about by traditional field surveys that are time-consuming, labor-intensive and likely destructive to travertine landscapes. This study was conducted in the following procedure. Four types of data were taken as classification objects, that is, original data and other three types of data that were converted respectively by multiple scattering, first-order derivative and second-order inverse for the original data. Then, these four types of data were respectively reduced to their corresponding dimensions by Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), according to the magnitude of the cumulative variance of the data. Finally, for the classification of these four types of data after dimension reduction, they were respectively put into four kinds of classifiers, namely, Support Vector Machines (SVM), Random Forests (RF), BP Neural Networks, and Convolutional Neural Networks (CNN). Overall Classification Accuracy (OA) was used as an evaluation index. In addition, Particle Swarm Algorithm (PSO) was used to optimize the penalty coefficient C and the Gammer parameter values of SVM. Afterward, the optimized SVM was applied to develop a recognition model of classification. Moreover, three indicators, namely, F1-Score, Kappa coefficient, and OA were utilized to assess the performance of SVM recognition model. In terms of the data type and the method of dimension reduction, the classification results of the recognition model established in this study were studied. In the aspect of the method of data dimension reduction, it was discovered that dimension reduction of the original data by PCA was superior to that by LDA. Furthermore, the classification model of the original data by PCA dimension reduction was generally more accurate than that by LDA. With regard to the type of data, the mean value of the model accuracy with MSC data as input was 88%, which was the second largest among the four types of data, only 0.1% lower than the first one. However, its variance and standard deviation were 0.043 and 0.042, respectively, much smaller than those of the models with the other three types of data, which indicated that the recognition model with MSC data was much more stable. Finally, the SVM classification model that was optimized by PSO demonstrated its outstanding performance when evaluated from the three performance indexes: F1-score, kappa coefficient, and OA. In general, this performance is superior to that of the unoptimized SVM recognition model, with the SD-PCA-PSO-SVM model the best performance among the three. Values of F1-Score, Kappa and OA of the classification results by optimized SVM were 0.93, 0.92, and 0.98, respectively. In conclusion, it is easier for the unoptimized classifier to acquire a high-precision recognition model, if the MSC data or the original data processed by PCA dimension reduction were selected as the input in the recognition of travertine. Additionally, selecting an appropriate theory to optimize the model can also improve the recognition performance of the model.
- travertine /
- hyperspectral image /
- data dimension reduction and transformation /
- particle swarm optimization /
- support vector machine

HTML

图 1 a.钙华的某7种高光谱曲线 b.非钙华地物的某6种高光谱曲线

Figure 1. a. Seven hyperspectral curves of travertine b. Six hyperspectral curves of non-travertine features

下载: 全尺寸图片幻灯片

图 2 PSO-SVM模型建立流程图

Figure 2. Flow chart of model construction of PSO-SVM

下载: 全尺寸图片幻灯片

图 3 （a、c、e、g）分别为经LDA降维后的OD、FD、MSC、SD数据；（b、d、f、h）分别为经PCA降维后的OD、FD、MSC、SD数据

Figure 3. (a, c, e and g) are OD, FD, MSC and SD data after LDA dimension reduction, respectively. (b, d, f, h) are OD, FD, MSC and SD data after PCA reduction, respectively

下载: 全尺寸图片幻灯片

图 4 四种数据的识别模型精度稳定性分析

Figure 4. Stability analysis on the accuracy of recognition models of four types of data

下载: 全尺寸图片幻灯片

图 5 a.四类数据的PCA-PSO-SVM模型测试集分类混淆矩阵；b.四类数据的LDA-PSO-SVM模型测试集分类混淆矩阵

Figure 5. a. Classification Confusion matrix of PCA-PSO-SVM model test set for four types of data; b. Classification Confusion matrix of LDA-PSO-SVM model test set for four types of data

下载: 全尺寸图片幻灯片

表 1 PSR-2500地物光谱仪的主要技术参数

Table 1. Main technical parameters of PSR-2500 ground object spectrometer

标称测量范围/nm	实际测量范围/nm	波长准确度/mm	光谱分辨率/nm	单样品采集数/个
350~2 500	334.3~2 535.9	5	≤3.5（350~1 000 ）； ≤22（1 000~2 500）	10

下载: 导出CSV

表 2 四类数据主成分和总方差解释

Table 2. Main components of four types of data and total variance interpretation

类别	总方差解释
类别	主成份	特征值	方差/%	累积/%
OD	1	648.66	84.462	84.462
	2	83.886	10.923	95.384
	…
	26	0.004	0.001	99.997
	27	0.004	0.000	99.997
MSC	1	648.667	84.462	84.462
	2	83.886	10.923	95.384
	…
	26	0.004	0.001	99.997
	27	0.004	0.000	99.997
FD	1	357.702	46.637	46.637
	2	81.224	10.590	57.226
	…
	133	0.004	0.001	99.993
	134	0.004	0.000	99.993
SD	1	155.906	20.353	20.353
	2	87.884	11.473	31.826
	…
	158	0.016	0.002	99.998
	159	0.014	0.002	100.00

下载: 导出CSV

表 3 四种数据的两种降维方法的降维效果优劣

Table 3. Advantages and disadvantages of dimension reduction by two methods for four types of data

数据类型	PCA	LDA
OD	√
FD		√
MSC	√
SD		√

下载: 导出CSV

表 4 四种数据经PCA、LDA降维后的数据分类精度对比

Table 4. Comparison of classification accuracy of four types of data after PCA and LDA dimension reduction

类别	SVM								RF		BP		CNN
	linear		poly		rbf		sigmoid		RF		BP		CNN
	PCA	LDA	PCA	LDA	PCA	LDA	PCA	LDA	PCA	LDA	PCA	LDA	PCA	LDA
OD	0.815	0.753	0.891	0.877	0.906	0.892	0.877	0.877	0.854	0.837	0.965	0.916	0.895	0.854
FD	0.877	0.646	0.892	0.877	0.877	0.877	0.908	0.877	0.796	0.857	0.895	0.895	0.937	0.959
MSC	0.815	0.877	0.908	0.877	0.908	0.892	0.877	0.877	0.816	0.837	0.958	0.958	0.875	0.854
SD	0.892	0.707	0.950	0.877	0.892	0.877	0.877	0.877	0.796	0.837	0.937	0.926	0.916	0.950

下载: 导出CSV

表 5 PSO-SVM的各个性能指标值

Table 5. Index values of the performance of PSO-SVM

模型类型	数据类型	TP	TN	FP	FN	OA	F1_score	Kappa
PCA-PSO-SVM	OD	5	38	2	3	0.90	0.67	0.61
	MSC	5	40	2	1	0.94	0.77	0.73
	FD	5	38	2	3	0.90	0.67	0.61
	SD	7	40	0	1	0.98	0.93	0.92
LDA-PSO-SVM	OD	6	39	1	2	0.94	0.80	0.76
	MSC	5	40	2	1	0.94	0.77	0.73
	SD	6	38	1	3	0.92	0.75	0.70
	OD	6	39	1	2	0.94	0.80	0.76

下载: 导出CSV

参考文献(25)

[1]	牛新生, 郑绵平, 刘喜方, 齐路晶. 青藏高原钙华沉积属性特征及其地质意义[J]. 科技导报, 2017, 35(6):59-64. NIU Xinsheng, ZHENG Mianping, LIU Xifang, QI Lujing. Sedimentary property and the geological significance of travertines in Qinghai-Tibetan Plateau[J]. Science & Technology Review, 2017, 35(6): 59-64
[2]	刘晶晶, 毛毳, 刘兴瑀, 魏荷花, 权莲顺, 刘泽璇, 张文鑫, 赵冰, 张青. 钙华的形成环境与特征及在油气储集方向的探讨[J]. 沉积学报, 2021, 39(6):1425-1439. LIU Jingjing, MAO Cui, LIU Xingyu, WEI Hehua, QUAN Lianshun, LIU Zexuan, ZHANG Wenxin, ZHAO Bing, ZHANG Qing. Overview of the formation environment and characteristics of travertines and discussion on the direction of oil and gas reservoir[J]. Acta Sedimentologica Sinica, 2021, 39(6): 1425-1439
[3]	蒋忠诚, 代群威, 董发勤, 张强, 党政, 汪智军, 刘凡. 国内外钙华岩溶景观的研究进展与展望[J]. 中国岩溶, 2021, 40(1):4-10. JIANG Zhongcheng, DAI Qunwei, DONG Faqin, ZHANG Qiang, DANG Zheng, WANG Zhijun, LIU Fan. Review of research progress and prospect of tufa/travertine karst landscape at home and abroad[J]. Carsologica Sinica, 2021, 40(1): 4-10
[4]	李刚, 董发勤, 代群威, 党政, 赵玉莲. 黄龙钙华有机碳测定方法的对比研究[J]. 岩石矿物学杂志, 2018, 37(1):152-160. doi: 10.3969/j.issn.1000-6524.2018.01.013 LI Gang, DONG Faqin, DAI Qunwei, DANG Zheng, ZHAO Yulian. Comparative study on the determination methods of organic carbon in Huanglong travertine[J]. Acta Petrologica et Mineralogica, 2018, 37(1): 152-160 doi: 10.3969/j.issn.1000-6524.2018.01.013
[5]	Ricketts J W, Ma L, Wagler A E, Garcia V H. Global travertine deposition modulated by oscillations in climate[J]. Journal of Quaternary Science, 2019, 34(7): 558-568. doi: 10.1002/jqs.3144
[6]	杨涵, 陈谦, 王宝刚, 李文生, 李文志, 王炳策, 钱建平. 利用高光谱技术预测采前猕猴桃干物质含量的可行性试验[J]. 农业工程学报, 2022, 38(13):133-140. doi: 10.11975/j.issn.1002-6819.2022.13.015 YANG Han, CHEN Qian, WANG Baogang, LI Wensheng, LI Wenzhi, WANG Bingce, QIAN Jianping. Feasibility of estimating the dry matter content of kiwifruits before being harvested using hyperspectral technology[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(13): 133-140 doi: 10.11975/j.issn.1002-6819.2022.13.015
[7]	Cen Yi, Huang Ying, Hu Shunshi, Zhang Lifu, Zhang Jian. Early detection of bacterial wilt in tomato with portable hyperspectral spectrometer[J]. Remote Sensing, 2022, 14(12): 2882.
[8]	Li Hongda, Cui Jian, Zhang Xinle, Han Yongqi, Cao Liying. Dimensionality reduction and classification of hyperspectral remote sensing image feature extraction[J]. Remote Sensing, 2022, 14(18): 4579.
[9]	张楠楠, 张晓, 王城坤, 李莉, 白铁成. 基于高光谱和连续投影算法的棉花LAI值估测[J/OL]. 农业机械学报: 1-8[2023-07-08]. http://kns.cnki.net/kcms/detail/11.1964.S.20220901.1615.030.html. ZHANG Nanna, ZHANG Xiao, WANG Chenkun, LI Li, BAI Tiecheng. Cotton LAI value estimation based on hyperspectral and continuous projection algorithm[J/OL]. Transactions of the Chinese Society for Agricultural Machinery: 1-8.
[10]	Xu Dandan, Zhang Dong, Shi Dan, Luan Zhaoqing. Automatic extraction of open water using imagery of landsat series[J]. Water, 2020, 12(7): 1928. doi: 10.3390/w12071928
[11]	李晶, 邓晓娟, 杨震, 刘乾龙, 王媛, 崔绿园. 基于时序多光谱影像的干旱草原区开采扰动信息提取方法[J]. 光谱学与光谱分析, 2019, 39(12):3788-3793. LI Jing, DENG Xiaojuan, YANG Zhen, LIU Qianlong, WANG Yuan, CUI Lyuyuan. A method of extracting mining disturbance in arid grassland based on time series multispectral images[J]. Spectroscopy and Spectral Analysis, 2019, 39(12): 3788-3793.
[12]	Yu Yinshan, Shao Mingzhen, Jiang Lingjie, Ke Yongbin, Wei Dandan, Zhang Dongyang, Jiang Mingxin, Yang Yudong. Quantitative analysis of multiple components based on support vector machine (SVM)[J]. Optik-International Journal for Light & Electron Optics, 2021, 237: 166759. doi: 10.1016/j.ijleo.2021.166759
[13]	Diago Cisneros L. Corrigendum to "Unitarity and symmetries of the multicomponent scattering matrix"[Ann. Phys. 420 (2020) 168255(1–43)][J]. Annals of Physics, 2022, 437: 168729. doi: 10.1016/j.aop.2021.168729
[14]	Romo Cárdenas G, Avilés Rodríguez G J, Sánchez López J D D, Cosio Leon M, Luque P A, Gomez Gutierrez C M, Nieto Hipolito J, Vazquez Briseno M, Navarro Cota C X. Nyquist-Shannon theorem application for Savitzky-Golay smoothing window size parameter determination in bio-optical signals[J]. Results in Physics, 2018, 11: 17-22. doi: 10.1016/j.rinp.2018.08.033
[15]	Booker N K, Knights P, Gates J D, Clegg R E. Applying principal component analysis (PCA) to the selection of forensic analysis methodologies[J]. Engineering Failure Analysis, 2022, 132: 105937. doi: 10.1016/j.engfailanal.2021.105937
[16]	杨明莉, 范玉刚, 李宝芸. 基于LDA和ELM的高光谱图像降维与分类方法研究[J]. 电子测量与仪器学报, 2020, 34(5):190-196. YANG Mingli, FAN Yugang, LI Baoyun. Research on dimensionality reduction and classification ofhyperspectral images based on LDA and ELM[J]. Journal of Electronic Measurement and Instrumentation, 2020, 34(5): 190-196.
[17]	Mantas C J, Castellano J G, Moral García S, Abellán J. A comparison of random forest based algorithms: Random credal random forest versus oblique random forest[J]. Soft Computing, 2019, 23(21): 10739-10754. doi: 10.1007/s00500-018-3628-5
[18]	Sun H. Prediction of building energy consumption based on BP neural network[J]. Wireless Communications and Mobile Computing, 2022, 2022: 1-10.
[19]	Yang S, Luo L, Tan B. Research on sports performance prediction based on BP neural network[J]. Mobile Information Systems, 2021, 2021: 1-8.
[20]	Román Gallego J, Pérez Delgado M, San Gregorio S V. Convolutional neural networks used to date photographs[J]. Electronics, 2022, 11(2): 227. doi: 10.3390/electronics11020227
[21]	Xie Zhihuai, Guo Zhenhua, Qian Chengshan. Palmprint gender classification by convolutional neural network[J]. IET Computer Vision, 2018, 12(4): 476-483. doi: 10.1049/iet-cvi.2017.0475
[22]	Sun Yuting, Ding Shifei, Zhang Zichen, Jia Weikuan. An improved grid search algorithm to optimize SVR for prediction[J]. Soft Computing, 2021, 25(7): 5633-5644. doi: 10.1007/s00500-020-05560-w
[23]	Wang Dongshu, Tan Dapei, Liu Lei. Particle swarm optimization algorithm: An overview[J]. Soft Computing, 2018, 22(2): 387-408. doi: 10.1007/s00500-016-2474-6
[24]	王铮, 符校, 杜凯旋, 刘纪平, 车向红. 深度学习支持下的地图图片典型地理目标检测[J]. 测绘通报, 2022(11):74-78. WANG Zheng, FU Xiao, DU Kaixuan, LIU Jiping, CHE Xianghong. Detection of typical geographic object in maps based on deep learning[J]. Bulletin of Surveying and Mapping, 2022(11): 74-78.
[25]	Heydarian M, Doyle T E, Samavi R. MLCM: Multi-label confusion matrix[J]. IEEE Access, 2022, 10: 19083-19095. doi: 10.1109/ACCESS.2022.3151048