, Available online , doi: 10.11932/karst2026y025
Abstract:
Yanhe County is situated within the transitional slope zone connecting the Guizhou Plateau, the Xiangxi Hilly Region, and the Sichuan Basin. The overall topography of the county exhibits a pattern of higher elevations in the northwest and southeast, with a lower central area. The landform is characterized by a distinct, narrow, and elongated belt that is wider in the south and tapers towards the north. This region features highly undulating terrain, intense geomorphological incision, complex geological structures, and consequently, a fragile geological environment. Under the combined influence of both endogenic and exogenic geological processes, the area exhibits a relatively high degree of landslide development, making geological disaster risks particularly prominent.To establish a scientifically sound and rational landslide susceptibility evaluation system, this study employed the Pearson correlation coefficient, Tolerance (TOL), and Variance Inflation Factor (VIF) to conduct correlation and multicollinearity analyses on candidate influencing factors. Through this process, nine key evaluation factors were ultimately selected: slope gradient, slope aspect, elevation, topographic relief, engineering rock group, distance to river system, distance to roads, distance to geological structures, and vegetation coverage. To address the issue of spatial bias inherent in the selection of non-landslide samples, negative samples were randomly distributed in areas outside predefined buffers surrounding known landslide points. This methodology effectively enhanced the spatial representativeness and overall rationality of the sample selection.Based on raster data units, and leveraging the respective strengths of different models, a combined approach was adopted. The Certainty Factor (CF) model offers advantages in quantitative analysis and indicator weight calculation, while the Support Vector Machine (SVM) model demonstrates excellent performance in handling non-linear relationships and complex pattern recognition. By integrating historical landslide distribution data, the nine selected evaluation factors, and the optimized non-landslide samples, a comprehensive training dataset suitable for the combined machine learning model was constructed. The study area was subsequently classified into four susceptibility grades—Low, Medium, High, and Very High—using the Natural Breaks classification method. The accuracy of the model predictions was determined based on the number of known landslide points falling within the Medium, High, and Very High susceptibility zones.The results revealed that the CF, SVM, and combined CF-SVM models successfully predicted 383, 389, and 392 landslide points, respectively. These figures account for 97.70%, 99.50%, and 99.74% of the total recorded landslides. The corresponding areal proportions designated as high-risk zones (encompassing High and Very High susceptibility classes) were 76.34%, 69.95%, and 60.96%, respectively. A key observation is that as the areal extent of the high-risk classification progressively decreased across the models, the concentration of landslide points within these zones significantly increased. This trend demonstrates a enhanced spatial aggregation of predicted hazard, indicating an improvement in the model's precision in pinpointing the most vulnerable areas.An analysis of the landslide susceptibility zoning results from the various models shows that the Very High and High susceptibility areas predominantly exhibit a belt-like distribution pattern. This pattern shows a significant spatial coupling relationship with the major surface water systems and geological structures. These high-risk zones are primarily concentrated in the mountainous areas near Heping Street, Ketian Town, and Xinjing Town, as well as the eastern sides of Heishui Town and Qitan Town. These regions are typically characterized by low mountain valley topography, well-developed geological structures, dense river networks, intensive transportation infrastructure, a high degree of urbanization, and frequent human activities—all factors contributing to the high frequency of landslide occurrences.In contrast, the Medium and Low susceptibility areas are mainly distributed in patchy and scattered patterns, though they are present in all townships. The Medium susceptibility zone generally forms a peripheral belt around the Very High and High susceptibility cores. These medium and low susceptibility areas are primarily located in mid-mountain valleys and high-altitude mountainous regions. Their geological environmental conditions are complex, but they feature lower population density, relatively scarce transportation facilities, lower levels of urbanization, and moderate to weak intensity of human engineering activities. While landslide hazards are still present in these areas, their developmental degree is relatively limited.The high-risk zones (combined High and Very High) identified by the CF-SVM model covered 90.3% of the historical landslide points. This distribution shows a high degree of consistency with the spatial pattern of actual recorded geological disaster hazards. Conversely, the proportion of disaster points located within the Low susceptibility area was merely 0.26%. This stark contrast further verifies the model's excellent discriminatory capability and predictive accuracy.The confusion matrix results for the three models indicate that the CF-SVM model performed the best, achieving a True Negative Rate of 91.45% and a True Positive Rate of 83.05%. The CF model, in comparison, demonstrated relatively weaker performance, with a True Negative Rate of 85.47% and a True Positive Rate of 68.64%. Overall, all models exhibited high classification performance in terms of both True Negative and True Positive Rates, reflecting their strong ability to discriminate unseen samples. However, when dealing with the complex and heterogeneous distribution of landslide and non-landslide samples, significant differences emerged between the models. Among them, the CF-SVM model demonstrated superior learning capability and produced the most effective classification results.The Receiver Operating Characteristic (ROC) curve analysis showed that the CF-SVM model achieved an Area Under the Curve (AUC) value of 0.889. This was significantly higher than the AUC values of the individually applied SVM model (0.871) and the CF model (0.790). This result indicates that the integrated CF-SVM model possesses a better goodness-of-fit and stronger generalization ability. It can more accurately delineate the spatial susceptibility characteristics of landslide geological disasters in Yanhe County. Therefore, this approach provides a scientific basis and reliable technical support for regional geological disaster risk assessment, monitoring and early warning systems, and prevention and control decision-making.