Landslide susceptibility map using certainty factor for hazard mitigation in mountainous areas of Ujung-loe watershed in South Sulawesi

This study aims to build a landslide susceptibility map (LSM) by using certainty factor (CF) models for mitigation of landslide hazards and mitigation for people who live near to the forest. In the study area, the mountainous area of the Ujung-loe watersheds of South Sulawesi, Indonesia, information on landslides were derived from aerial photography using time series data images from Google Earth Pro© from 2012 to 2016 and field surveys. The LSM was built by using a CF model with eleven causative factors. The results indicated that the causative factor with the highest impact on the probability of landslide occurrence is the class of change from dense vegetation to sparse vegetation (4-1), with CF value 0.95. The CF method proved to be an excellent method for producing a landslide susceptibility map for mitigation with an area under curve (AUC) success rate of 0.831, and AUC predictive rate 0.830 and 85.28% of landslides validation fell into the high to very high class. In conclusion, correlations between landslide occurrence with causative factors shows an overall highest LUC causative factor related to the class of change from dense vegetation to sparse vegetation, resulting in the highest probability of landslide occurrence. Thus, forest areas uses at these locations should prioritize maintaining dense vegetation and involving the community in protection measures to reduce the occurrence of landslide risk. LSM models that apply certainty factors can serve as guidelines for mitigation of people living in this area to pay attention to landslide hazards with high and very high landslide vulnerability and to be careful to avoid productive activities at those locations.


Introduction
Landslide susceptibility map (LSM) is a very useful tool and plays a vital role in estimating, managing and mitigation landslide hazards (Chau and Chan, 2005;Dou et al., 2015). LSMs can provide information on the likelihood of landslides occurring in an area based on the local terrain (Dou et al., 2015). Although LSMs rely on a somewhat complicated knowledge of slope movements and their controlling factors. The reliability of LSM depends mostly on the quality and amount of available data, scale area and the selection of the appropriate methodology of analysis and modeling. The process of creating the map involves several qualitative and quantitative approaches Dou et al., 2015).
The qualitative model contributes to susceptibility levels in descriptive terms, with the main limitation that accuracy depends on the knowledge of the expert who conduct the research (Dou et al., 2015;Neuhäuser and Terhorst, 2007). The method investigates the relationships between landslide and causative factor to predict the probability of landslide occurrence, which is a widely used approach among researchers around the world (Dou et al., 2015). Commonly used causative factors are elevation, slope, aspect, curvature, distance to river, drainage density, lithology, distance to faults, rainfall, distance to road and land use change (Ayalew and Yamagishi, 2004;Chau and Chan, 2005;Dou et al., 2015;Soma and Kubota, 2017a). Certainty factor (CF) is one of the qualitative models. Terms of CF apply an approach using rule-based expert systems to resolve problem classes (Dou et al., 2015). In this research, CF is used to build a landslide susceptibility map (LSM) by using certainty factor (CF) models for mitigation of landslide hazards. CF is applied for selecting the positive causative factors related to landslide occurrence.
In the mountainous area of Ujung-loe watershed, the landslides are triggered mostly by rainfall. The dominant lithology unit in the Mountainous area of Ujung-loe watershed is One quarter volcanic from the Lompobattang range, and breccia. The objectives of this study are to create an LSM by using certainty factor model for mitigation of landslide hazards for people who live near forest areas in the Ujung-Loe mountain and watershed areas.

Materials and Methods
The study area is located in the mountainous area of the Ujung-Loe upper watershed. It is located in Bulukumba and Sinjai Regency, South Sulawesi Province, Indonesia. Landslide disasters occur almost every year in this area, and have resulted in disaster in the past. Susceptibility for landslides are higher during the rainy season, which induces flash floods and debris flows from the upstream areas. The altitude change is between 255 -2,860 meters above sea level with a total area of 79.79 km 2 (Figure 1).

Figure 1 Study area in Ujung-Loe watershed, Indonesia
The climate of this location is tropical with specific characteristics of the two seasons of the year: the rainy season and dry season. The northeast monsoon creates the rainy season between November and July (between March -July has the maximum precipitation), and the southwest monsoon causes the dry season from August to October. The annual rainfall data recorded at three stations, i.e., Apparang Hulu station, Malino station and Tanete/Bulo-bulo Station from 2010 to 2015. The Rainfall recorded at Apparang Hulu station was 2,976 to 5,052 mm/year with average annual rainfall 3,965 mm/year. Rainfall recorded at Malino station was 3,271 to 5,346 mm/year with average annual rainfall 3,933 mm/year and recorded at Tanete/bulo-bulo station was 2,237 to 5,711 mm/year with average annual rainfall 3,538 mm/year. The monthly rainfall is more than 400 mm in December and rises to 1,168 mm in July (Meteorological, Climatology, and Geophysical Agency Makassar, 2016). Due to the increasing intensity of rainfall, this has translated incto increasing occurrences of landslides in this location.
This research is divided into three main stages, i.e., data preparation, data analysis by certainty factor (CF) and validation ( Figure 2).

Preparation of data
Data preparation, Management, collection, and selection must be accurate in establishing a spatial data landslide inventory and a causative factor. This preparation data using GIS tools with ArcGIS© 10.3. For the analysis of the certainty factor (CF) calculation is done by Microsoft Excel© and ArcGIS© 10.3 environment.

Landslide inventory
Landslide inventory can include field surveys and interpretation of remote sensing images based on spectral characteristics, shape, contrast and morphological expression (Kanungo et al., 2006). This study used landslide events during the period 2012 -2016 to quantitatively evaluate the influence of land use change from 2004-2011. Landslides from 2012 to 2016 were collected by using air photography from Google Earth Pro© and ground survey ( Figure 1). Moreover, to identify landslides occurrence by year, we delineated images according to year, beginning in 2012 until 2016. A total of 188 landslides were identified, covering an area of 43.65 hectares (0.44 km2). Most of the landslides are of the shallow type with minimum and maximum landslide area of 137 m2 and 15,600 m 2 , respectively. The study area was limited to the upper area of Ujung-Loe Watersheds. Figure 1 shows the location of all landslide data and these were divided into two groups, i.e., a landslide for training 2,873 pixels (70%) and a landslide for validation 1,230 pixels (30%). The selection data of training and validation data was using random selection in ARC GIS tool environment.

Landslide causative factors
In a landslide susceptibility map, the most important assumption is that the incidence of landslides that will occur in the same condition is affected by the cause of the landslides that have occurred. There are no strict guidelines for the selection of causal factors for use in assurance factor analysis and have been widely used by many studies Dou et al., 2015). Furthermore, the determination of landslide causative factors is heavily reliant on data availability. Therefore, we chose causative factors based on the general knowledge found in previous studies (Rasyid et al., 2016) and data availability in the target area. So according to past research and data availability, we use eleven (11) causative factors, i.e., elevation, slope, aspect, curvature, lithology, distance from fault, distance to river, drainage density, precipitation, distance from the road, and land use change (LUC) (Figure 3). It was described as the independent variable. Independent variables and the dependent variable was used as an input for analysis landslide susceptibility map with a pixel resolution of 10 m × 10 m. Moreover, Landslide occurrence describes as the dependent variable.

-Slope
The slope was extracted from digital contour data with an interval of 12.5 meters. Digital contour data was derived from RBI map with a scale of 1:25,000 from the Geospatial Information Agency. We used six class of slope, i.e., 0-10°, 10-20°, 20-30°, 30-40°, 40-50°, and above 50°, which were considered and represented in the form of slope thematic data layer. Likewise, the aspect map plays a significant role in slope stability assessment (Chauhan et al., 2010) -Aspect Aspect was extracted from digital contour data with an interval of 12.5 meters. Digital contour data was derived from RBI map with a scale of 1: 25,000 from the Geospatial Information Agency. Aspect was divided into nine classes namely, flat, north, northeast, east, southeast, south, southwest, west, and northwest.
-Curvature Curvature was extracted from digital contour data with an interval of 12.5 meters. Digital contour data was derived from RBI map with a scale of 1: 25,000 from Geospatial Information Agency. Profile curvature was classified into three categories; concave, convex, and flat. The value of the arch represents topographic morphology. In the case of profile curvature, it is associated with inundation conditions after heavy rains. Curvature slope profiles contain more water and hold water from high rainfall for more extended periods.

-Lithology
The lithology is related to the strength of the material, because lithologic composition and structure vary for different types of rocks (Kanungo et al., 2006), and resistance to the driving force depends on the strength of rocks. There are three lithologies in this area, i.e., Quarter Lompobattang Volcanic Breccia (Qlvb), Quarter Lompobattang Volcanic (Qlv) and Quarter Lompobattang Center (Qlc).
-Distance to faults Faults are structural features, which describes the zones/areas of weakness, fractures, and among lineament going higher susceptibility to landslides. It has been observed that the probability of increased landslide occurrence in a location is close to faults, and not only affects the surface structure of the material but also contributes to the permeability and cause of slope instability (Rasyid et al., 2016). For this purpose, the distance to faults is used to analyze the incidence of landslides occurrence. The distance to the fault is done by buffering the map of faults in ARC GIS 10.3 ©.
-Distance to river Distance to river and landslide occurrence in the hilly area have strong association due to erosion processes. Closer to the river, the soil conditions will be more humid, and with soil moisture, soil fertility will be high so that the soil bonds are not stable so it will quickly result in erosion conditions and landslides, especially during the rainy season. The distance from the river was calculated by buffering the map of the river in ARC GIS 10.3. River layer was derived from RBI map of scale 1:25.000. The classification of distance to a river begins from 0 to 100 m and ends with > 500 meters (m).
-Drainage density Drainage density was derived from the river line and analysis by using Arc GIS 10.3 © tool to classify the drainage density in five classes, starting from class 0 to 1 km/km 2 and ended with >4 km/km 2 . With more density in the drainage, the soil conditions will be more humid and soil moisture soil fertility will also be high, resulting in the soil bonds becoming less stable and can quickly create erosion and landslide conditions, especially during the rainy season.

-Rainfall
Rainfall is a trigger to create landslide conditions. Rainfall was classified by using three rain gauge station near the study area, which were then used to create polygons using the Thiessen polygon analysis in Arc GIS 10.3 ©.
-Distance to road Similarly as distance from the river, the distance from the road was also derived from RBI map with data at a scale of 1: 25,000 collected from the Geospatial Information Agency. Landslide occurrence in the hilly area has a strong association with erosion processes, and when road construction takes place, it changes the stability of slope and can affect landslides. The distance to the road was divided into nine (9) classes namely, 0 -500 m, 500 -1000 m, 1000 -1500 m, 1500 -2000 m, 2000 -2500 m, 2500 -3000, 3000 -3500 m, 3500 -4000 m, and > 4000 m.

-Land use change (LUC)
Land use change is a key element/factor responsible for landslide events. The incidence of landslides is inversely proportional to the density of vegetation. This research used LUC factor as identification of vegetation density. Change in land use to the critical slope triggered a series of shallow and profound landslides (Mugagga et al., 2012;Hasnawir et al., 2017). The Land use map derived from Soma and Kubota (2017a) with open area, paddy field, farming area, scrub, savanna, secondary forest and primary forest. Moreover, LUC built by classifying LU 2004 and 2011 in four classes in density, i.e., sparse vegetation (open area, paddy field), Medium Vegetation (Farming area and Shrub, Savana), High (secondary forest) and Dense Vegetation (primary forest) again. Then, overlay each other using ArcGIS© 10.3 and founded 13 classes as a class of LUC. i.e., 1 -1 (no change of sparse vegetation) , 1 -2 (change from sparse vegetation to medium vegetation), 2 -1(change from medium vegetation to sparse vegetation), 2 -2 (no change on medium vegetation), 2 -3 (change from medium vegetation to high vegetation), 3 -1 (change from high vegetation to sparse vegetation), 3 -2 (change from high vegetation to medium vegetation), 3 -3(no change of high vegetation), 3 -4 (change from high vegetation to have similar density of dense vegetation ), 4 -1 (change from dense vegetation to sparse vegetation), 4 -2 (change from dense vegetation to medium vegetation), 4 -3(change from dense vegetation to high vegetation), and 4 -4 (no change on dense vegetation). LUC in pixel 30 x 30-meter resampled to pixel 10 x 10 meter.

Data Analysis by Certainty factor
The certainty factor (CF) is a rule-based expert system method developed by (Shortliffe and Buchanan, 1975). The CF values range between -1 to 1, it indicates a measure of belief and disbelief and can be calculated using the following function as Equation 1. Here, higher CF value indicates a higher relationship with landslide occurrences.
Where; PPa is the probability of landslides in class and PPs is the prior probability of a total number of landslides in the study area.

Validation and verification
During the modeling predictions, the most essential and critical component is to carry out the validation of the results of prediction (Chung and Fabbri, 2003). Data for validation were selected randomly on each part of landslide occurrence not including the training dataset. Moreover, to illustrate the procedure, a small portion of the landslide-prone areas were selected as the data for validation. Size, area, distribution, and depth of landslides significantly varies from place to place. Also, we used the receiver operating characteristic (ROC) curve to plot predicted probabilities to estimate the model's accuracy. For validating the landslide susceptibility map, the area under the curve (AUC) was used as a measure of overall fit and comparison of modeled predictions. The model with higher AUC is considered to be the best. If the area under the curve (AUC) is close to 1, the result of the test is excellent. On the other hand, if the model does not predict well, then this value will be closer to 0.5. The area determines the success rate AUC of the training dataset, and predictable level calculated from the AUC of the validation dataset. ROC curves are used to evaluate the predictive accuracy of the model selected in the statistical approach of dichotomous (Gorsevski et al., 2006). AUC Obtained from the ROC plot statistics most preferred types and influence rating (Akgun et al., 2012). In this study, the validation process further demonstrates the level of accuracy of LSM by calculating the ratio of the data for validation of landslides that fall into each vulnerability class. It was assumed that most of the landslides for validation must occur on a high-class with the highest susceptibility factor (high and very high).

Certainty Factor
Table 1 and Figure 4 indicates a correlation between landslide occurrence and each class of landslide causative factors. In the relation between landslide occurrence with elevation causative factor indicates that elevations between 1000 to 2000 meters (m) has a probability of landslide occurrence with CF value from 0.29 -0.66. Then, causative factor of slope indicates slope class of above 50°, 40°-50°, 30°-40° and class 20°-30° has a CF value 0.810, 0.752, 0.325 and 0.006 respectively, which indicates a high belief of probability of landslide occurrence for slope below 20° has a CF value <1, which indicates a very low probability of landslide occurrence and it is meant more steep will probability to landslide occurrence. Such as  point out that slope gradient is the most substantial cause of landslide occurrence and it affects the concentration of moisture and the level of pore pressure and is often used to resolve detailed patterns of instability. In curvature class, only concave has a conviction of probability of landslide occurrence with CF value 0.250. In the case of aspect class, the north, northwest, south, and northeast facing slopes, CF value is >0, which indicates a belief in the probability of landslide occurrence.
In the case of lithology causative factor, only Qlv has a ratio of >0 among the three lithology classes, which indicates a belief of probability of landslide occurrence. In the case of the distance from the fault, rivers, and roads used to understand the ratio of the distance/proximity to the level of influence on the landslide. Distance from fault below 7500 m has a ratio of >0. It shows that as the distance from the fault decrease has a belief of probability of landslide occurrence increases. Also in the distance from the river below 100 m has CF value >0. In the case of the distance from the road above 500 m has a CF value >1. In the event of distance from roads, the landslide densities are higher for distance classes far away, and its meaning distance to the road is not to effect to the landslide in this case because in study area not many road constructions because of the study area located in the mountain area. In drainage density, causative factor shows overall that more density has the more belief of probability of landslide occurrence.
In precipitation class, only in class precipitation 3739 mm/year has a CF >0, which indicates a belief of probability of landslide occurrence. This increasing rainfall rate, it is possible for many forest slopes to become unstable and prone to landslide disaster shortly (Aditian and Kubota, 2017). Moreover, in LUC causative factor shows the class of change from dense vegetation to sparse vegetation (4-1) has the highest belief of probability of landslide occurrence with certainty factor (CF) 0.95. In the second, class of change from high vegetation to sparse vegetation (3-1) with CF value 0.61.  Figure 4 Graph of value certainty factor of each causative factor Overall from the correlation between landslide occurrence with causative factors shows LUC causative factor with the class of change from dense vegetation to sparse vegetation (4-1) has the highest belief of probability of landslide occurrence with certainty factor (CF) 0.95. It is happening because of change in the landscape of dense vegetation (primary forest) to sparse vegetation, which will affect slope stability and thus result in landslide occurrence. Such as Hasnawir et al. (2015) point out that land with forest having the root system would reinforce the soil strength and stabilizes the slope, and also Soma and Kubota (2017b) show that land use change indicates significant effects on landslide occurrence and slope instability. The highest possibility of landslide occurrence in this study area will be most susceptible if the area changing from dense vegetation to sparse vegetation with an elevation between 1000 -2000 meter, steep slope, aspect to north or northeast, near to river and faults on Quarter Lompobattang Volcanic of lithology, and with heavy rainfall.
Landslide susceptibility index created by combined pairwise layer according to integration rules (Pourghasemi et al., 2013). The combination of CF values of two thematic layers 'Z' is expressed by the following equation as given by Binaghi et al. (1998)  (2) The certainty factor values are computed by overlaying each thematic layer with the landslide map and calculating landslide frequencies. Each thematic layer is reclassified according to the certainty factor value calculated and is combined pairwise to generate the landslide susceptibility map using the integration rule of Eq. 4. Table 2 illustrates the integration using a parallel combination.
In this study, CF method conducts one more validation to choose the best statistical model for creating landslide susceptibility map and the best equation in CF approach. The CF by using Eq. (2) To create landslide susceptibility map (LSM). LSM class was created by reclassifying LSI of the models using natural breaks method and overlaid landslide data validation on LSM, which will describe another level of accuracy alongside the AUC curve. The natural breaks method or Jenks optimization method has been used widely especially by planners, and it is designed to determine the best arrangement of values into different classes. This approach maximizes the variance between classes and reduces the variance within classes. The five classes include very low, low, moderate, high and very high describing the level of landslide susceptibility in the research location. The level of accuracy of the landslide susceptibility map was verified by overlaying with the landslide data for validation. The characteristic of susceptibility classes on landslide susceptibility map using certainty factor method is shown in Figure 5 and Table 3.   Table 4 shows results of AUC curve for both success rate and predictive rate for each test. In general, the AUC of ROC curves representing excellent, good, and missing values tests were plotted on the graph. The classify the accuracy of a diagnostic test i.e. the value ranges from 0.50 to 0.60 (fail), 0.60-0.70 (poor), 0.70-0.80 (fair), 0.80-0.90 (good), and 0.90-1.00 (excellent) (Rasyid et al., 2016). The results of overlaid landslide data for validation on LSM, CF value 0.831, which shows that CF model is a good model to identify landslide. In predictive rate, CF value 0.830, which shows that CF model is good to predict of landslide occurrence. Moreover, success rate and predictive rate value for all method were a closeness with interval 0.01 that indicates all the method more reliable to a predictive landslide in the future. The closeness of success rate and predictive rate values show how the method helps or good in landslide prediction in the future (Meten et al., 2015). Furthermore, validation with the percentage of landslide fell into LSM class high and very high, CF model with value 85.28% was a good result to predict landslide occurrence. So using CF method was a good method to produce LSM for mitigation in a mountainous area of Ujung-Loe Watershed.

Mitigation of Landslide Hazards
According to analysis of Landslide susceptibility map using certainty factor in this research indicate LUC on the class of change from dense vegetation to sparse vegetation (4-1) has the highest belief of probability of landslide occurrence so mitigation should pay attention to this especially on steep slopes to protect the people who live near to the forest. Thus, it is better to use the forest in accordance with its function without converting forest area by conducting Community forestry system to reduce the occurrence of landslide caused by deforestation. LSM using certainty factors can serve as guidelines for mitigation of people living in this area to pay attention to landslide hazards with high and very high landslide vulnerability to be careful or avoid to stay in those locations ( Figure 5).

Conclusions
In conclusion, the correlation between landslide occurrence with causative factors shows an LUC causative factor with the class of change from dense vegetation to sparse vegetation (4-1) has the highest probability of landslide occurrence with certainty factor (CF) 0.95. CF was a good method to produce a landslide susceptibility map with a value of AUC success rate 0.831 and AUC predictive rate 0.830 and 85.28 % of landslides validation fell in the class of high to very high. Thus, it is better to use forest areas and ensure high vegetation in partnership with communities to reduce the occurrence of landslides resulting from deforestation. LSM using certainty factors can serve as a guideline for landslide mitigation to reduce the risk of people living in this area by paying attention to landslide hazards with high and very high landslide vulnerability to avoid changing the vegetation or building at those locations.