Correlation of Climate Variability and Burned Area in Borneo using Clustering Methods

The island of Borneo has faced seasonal forest fires for decades. This phenomenon is worsening during dry seasons, especially when droughts are concurrent with the El Niño-Southern Oscillation (ENSO) phenomenon. Climate is therefore one of the drivers of the fire phenomenon. This paper studies the relationship between climate variables, namely temperature, precipitation, relative humidity, and wind speed, and the occurrence of forest fire using two clustering methods, K-means and Fuzzy C-means (FCM) clustering methods. Borneo is clustered into four areas based on burned area data obtained from Global Fire Emission Data (GFED). It is also clustered according to the combinations of climate variables. Both methods reach the highest correlation between the climate variable and the burned area clusters in September. The K-means method gives a correlation of -0.54 while the FCM gives -0.55. In August until October, relative humidity provides the dominant correlation affecting burned area, even though an additional precipitation or wind variable slightly increases the correlation in the FCM method. In November, temperature largely contributed to the burned area by a positive correlation of 0.31 in K-means and 0.33 in FCM. The evaluation performance of the methods is conducted by an internal validation called the Silhouette index. Both methods have positive index values ranging from 0.39 to 0.69 and the maximum value is influenced by the wind cluster. This indicates that the clustering methods applied in this paper can identify one or a combination of climate variables into dense and well-separated clusters.


Introduction
Forest and land fires have become a concern of the global community, especially since the El Niño Southern Oscillation (ENSO) in 1982(ENSO) in -1983(ENSO) in , 1994(ENSO) in , and 1997(ENSO) in -1998 also resulted in high incidence and intensity of fires (Kita, Fujiwara, & Kawakami, 2000). The El Niño phenomenon causes smaller amounts of rainfall throughout the year, resulting in a longer dry season (Fadholi, 2013). Other climatic factors such as temperature, humidity, and wind also influence forest fires (Aryadi, Satriadi, & Syam'ani, 2018). Forest and land fires cause a decrease in the quality of natural ecosystems, such as forest and vegetation damage (Margono et al., 2012). Uncontrolled fires are a major factor contributing to the loss of forests and land in the tropics (Siegert & Hoffmann, 2000). Forest fires are also considered a threat to sustainable development because of their contribution to the increase of carbon emissions (Tacconi, Moore, & Kaimowitz, 2007). Barber (2000) stated that forest fires have occurred since the 17th century in Borneo. It is caused by many factors, such as climate, environment, and also social-economic factors (Barber & Schweithelm, 2000). When a drought caused by the El Niño weather phenomenon occurred like in 2015 and 2019, the forest fires are pronounced. This suggests that climate plays an important role in the fires in Borneo. Based on several previous studies (Aflahah, Hidayati, Hidayat, & Alfahmi, 2018;Latifah, Shabrina, Wahyuni, & Sadikin, 2019;Mareta, Hidayat, Hidayati, & Latifah, 2019), there are several climate variables affecting fires in Borneo. Based on climatic information and burned areas in the past, precise information can be generated to help prevent and reduce the risk of fires in the future.
Clustering methods, such as K-means and Fuzzy C-means (FCM) methods have been widely used and established to cluster forest fires to analyze fire risk. Previous studies (Pramesti, Lahan, Tanzil Furqon, & Dewi, 2017;Shoolihah, Furqon, & Widodo, 2017;Sukamto, Id, & Angraini, 2018) implemented methods to cluster the fires phenomenon over Indonesia based on the MODIS satellite-based fire hotspot data. Their studies highlight that the K-means clustering method can analyze fire data and capture potential fires at a high, medium or weak level. Another study (Wang, Liu, & Cui, 2017) showed that the K-means clustering method can quickly and accurately map the satellite images of fires and recognize images of burning land in experiments with large amounts of data. Ganesan (2016) compared the K-means and FCM clustering methods in detecting fire zones in Oregon, United States of America. His study concluded that the combination of the fuzzy means clustering method with the genetic algorithm as a segmentation method was more effective than K-means clustering methods for choosing the best pixel photos. Furthermore, the K-means clustering method was also applied to support the identification process of forest fires automatically using spatial digital photos (Prasad & Ramakrishna, 2008). Mustofa (2014) predicted the size of forest fires with climate variables and the forest fire weather index using the FCM clustering method that can successfully classify fire levels into three categories namely low, light and heavily burned area (Shidik & Mustofa, 2014). Jafarzadeh (2017) studied the relationship between climate and geographical variables and the fire events in Ilam Province, West of Iran. FCM was used to map all of the variables into 3 clusters, namely low, medium and high. Their study concluded that there was a strong relationship between the forest fire clusters and the cluster of 8 input variables, including: distance from the settlements, population density, distance from a road, slope, standing dead oak trees, temperature, land cover and distance from farmland (Jafarzadeh, Mahdavi, & Jafarzadeh, 2017). Differing from the aforementioned studies, in this paper we apply global satellite data that contains dense spatial information about climate and burned areas to cluster areas over Borneo. This is also motivated by the limited available local data in Borneo.
In this paper, we focus on studying only the climate variables, namely temperature, precipitation, relative humidity, and wind speed. This paper clusters Borneo based on the climate characteristics then relates the climate variables to the occurrence of the forest fires that are obtained from the Global Fire Emissions Database (GFED). This study aims to investigate the correlation between the cluster of forest fires and the cluster of climate variables over Borneo Island. Mapping in Borneo can provide information to predict and to prevent the damage of forests and land in the region.

Data and sources
This paper takes the whole Borneo Island as shown in Figure 1 as the study case without being limited to the administrative areas of the state, province, or district. Geographically, Borneo is directly bordered by the South China Sea to the north, the Sulawesi Sea to the east, the Java Sea to the south, and the Karimata Strait to the west, and is on the equator. It is the third largest island in the world and has diverse climate variations. Most of the land in Borneo is covered by primary and secondary forest that is highly vulnerable to fires. In the simulation, the area is gridded into 40x44 horizontal grids with a size of 0.25 degrees. By neglecting the ocean, the gridded area becomes only 1058 grids.

Figure 1. Borneo map based on topographic data in meters
This study uses monthly burned area and climate data in the period of 1 January 1998 -1 December 2015. The data of the burned area is obtained from the Global Fire Emission Data (GFED) that shows data of globally mapped burned area data available since 1997 (Giglio, Randerson, & Van Der Werf, 2013). It combines the satellite information on fire activity and vegetation productivity to estimate gridded monthly burned area and fire emissions. The climate data consisting of the monthly data of precipitation (PCP), air temperature (TEMP), relative humidity (RH) and wind speed (WIND) are obtained from the reanalysis data of the ERA-Interim. Since the forest fires in Borneo occur seasonally starting in August and ending approximately in the beginning of December, the monthly burned area and the climate data for August until November in 1998-2015 are analyzed.

Clustering methods
2.2.1 K-means method K-means method is a grouping algorithm. It is one of the most popular and oldest among practitioners because of the ease of implementation and the speed of processing (Xu, Chiang, Liu, & Tan, 2017). This learning algorithm is included in unsupervised learning methods (Suyanto, 2018). Unsupervised learning techniques can organize data even if they do not have class labels. The Kmeans grouping algorithm classifies the points given into k groups so that the distance between members of the same group is minimized (Natingga, 2017). The k-means clustering algorithm determines the initial k-centroid (the points are in the middle of the cluster) -one for each cluster. Then each feature is classified into clusters whose centroids are closest to that feature. After classifying all the features, we formed the initial k cluster. For each cluster, we recalculate the centroid to be the average of points in that cluster. After we moved the centroid, we recalculated the class again. Features can change classes. Then we recalculated the centroid again. If the centroid does not move anymore, the k-means clustering algorithm ends.
The shortest distance from one data to all centroids is calculated to group the data. The formula for calculating distances is shown in Equation (1).
where is 1 to calculate Manhattan distance, is 2 to calculate Euclidean distance, and is ∞ to calculate Chebychev distance. and are two data that will be calculated the distance and p is the dimension of data (Maimon & Rokach, 2010). An object is classified into a k cluster such that the Sum of Square Error (SSE) of the object to the k centroid is minimized. SSE is the total of the squared distances between all dimensions of each cluster member with the corresponding cluster centroid. Then, updating a centroid can be calculated from the average of the data objects for each dimension with the following formula where is the centroid point of the k cluster, is the amount of data in the k cluster, and is q data in the k cluster.

Fuzzy c-means method
Fuzzy C-means (FCM) clustering method was introduced by Dunn (1973) and developed by Bezdek (1981) (Bezdek, 1981;Dunn, 1973). FCM method is the extent of K-means or hard c-mean clustering method. FCM is an unsupervised clustering algorithm used to analyze the distance between data points (Ghosh & Kumar, 2013). FCM has been widely applied to problems related to feature analysis and clustering in various fields such as agricultural engineering, astronomy, chemistry, geology, image analysis, medical diagnosis, shape analysis, target recognition (Yong, Chongxun, & Pan, 2004).
In contrast to the k-means method, the data set is grouped into c clusters so that every data point is related to every cluster with a different degree of belonging. This degree indicates the level of data points that each cluster has. The data points that lie far from the center of a cluster have a low degree of belonging. This method creates an optimal partition by minimizing the objective function.
where is the data set p-dimension. n is the number of data points, is the number of clusters with 2 ≤ < , is the degree of belonging of in -cluster , is a weighting exponent on each fuzzy membership to determine the amount of fuzziness of the clustering.
is the center of cluster and 2 ( , ) is the distance between a data point and the cluster center. The solution of objective function can be obtained by performing an iteration process with an algorithm described in (Bezdek, Ehrlich, & Full, 1984).
The initial step is to choose the number of clusters and the weight exponent with 2 ≤ < , then initialize the partition matrix 0 . Set the counter loop for each step as = 0,1,2, … 1. Calculate the vector of center clusters 2. Calculate the distance matrix [ , ] = (∑( − ) 3. Update the partition matrix for ℎ step, If ‖ +1 − ‖ < then the process stops otherwise it returns to step 2 by updating the cluster centers and the degree of belonging iteratively (Rao & Vidyavathi, 2010).

Processing and clustering data
This study uses the climate data type netCDF and the burned area data type HDF5. Climate data includes monthly data for precipitation, temperature, and wind speed. Each climate data is a dimensionless array with a size of 40 x 44 x 18 for each month. This data consists of 40 latitude points, 44 longitude points, and a period of 18 years from 1998 to 2015. Furthermore, the GFED burned area data for each month is adjusted based on the latitude and longitude of the island of Borneo and averaged monthly from 1998-2015. To avoid unnecessary computation in the ocean area, we define the climate data as NaN value for the point at the ocean. All processed data are then clustered using the K-Means and the FCM algorithms using the clustering toolbox in Matlab software.

Silhouette Index
In this paper, we validate clustering results using the Silhouette index proposed by Rousseeuw (1987), an internal validation or accuracy assessment for clustering methods. The Silhouette is a widely used index for assessing the fit of individual objects in the classification, as well as the quality of clusters and it combines two clustering criteria, compactness and separation (Lengyel & Botta-Dukát, 2019). Therefore, it measures how well the data has been clustered. The index value ranges from -1 to +1. The higher the Silhouette index value, the better the clustered data. If the value is very low or negative, the configuration of the clustering is less appropriate, and it could be that too many or too few numbers form the clusters.

Cluster of burned area
Using the K-means and FCM clustering methods, the burned area data in August until November in the period 1998-2015 are grouped into four levels. This is based on the Regulation of the Minister of Environment and Forestry Republic of Indonesia about the technical criteria for status of alert and emergency forest and land fire. There are four status types, namely normal, standby, emergency alert, and emergency (Kementerian Lingkungan Hidup dan Kehutanan Republik Indonesia, 2018). The upper plot in Figure 2 shows the K-means result, while the lower plot shows the FCM result. The cluster-level in Figure 2 is shown in blue, light blue, yellow, and red. Both methods show similar results, that the area is dominated by cluster 1 (blue) that is centered at almost zero value and with few areas highly burned (red). Cluster 1 represents no fire. Small fires are grouped in cluster 2 (light blue) which mostly occur in the southern, eastern and central part of Borneo. Larger fires shown in clusters 3 (yellow) mostly appear in September and October. The cluster 3 represents the area where approximately 1% area is burning. The largest fire shown in cluster 4 occurs only over a few grid areas in the southern part of Borneo, especially in September.
In the months of fire susceptibility, it can be concluded that the clusters with small to large burned areas are located mostly in Central, South and East of Borneo. The results of the cluster process using burned area data from the satellite have similarities with the results of previous studies using hotspots as a representation of forest fire data. Sari (2020) concluded that Berau, East Kutai and Kutai Kartanegara are the sites with the highest hotspots in Eastern Borneo classified by Nearest Neighbor Analysis (Sari, Rachmita, & Manessa, 2020). This result is also shown by the light blue area (cluster 2) in Eastern Borneo in Figure 2. The data from the Ministry of Environment and Forestry in the Forestry and Land Monitoring System shows that Central Kalimantan or Southern Borneo is one of the provinces in Indonesia with a high level of forest fires. In addition, Pratamasari (2020) shows that the highest hotspot area in Central Kalimantan or Southern Borneo is located in East Kotawaringin, Palangkaraya City, and Pulang Pisau based on Kernel Density analysis (Pratamasari, Permatasari, Pramudiyasari, Manessa, & Supriatna, 2020). This corresponds to the red area or cluster 4 in September and November in Figure 2.

Cluster of climate variables and the relationship with burned area cluster
We cluster Borneo into four areas based on single or multiple climate variables using both Kmeans and FCM clustering methods. The clustering results are then correlated to the burned area cluster. Figure 3(a) shows the correlation between the burned area cluster with the climate variable cluster computed by the K-means clustering method. In this method, the variable of relative humidity gives the largest correlation and it has a negative correlation with the burned area. This agrees with the condition that with a drier climate, larger fires will occur. The relative humidity has the highest correlation value of 37.86% in August, 54.8% in September, and 54.28% in October. Using the FCM method, an additional precipitation variable to the relative humidity gives the highest correlation; 38.52% in August, 55.52% in September. In October, additional wind speed variables to the relative humidity give the highest correlation at 50.89%. In November, both methods reach the highest correlation from a single climate variable, namely temperature. The temperature has a positive correlation with the burned area.
The discussion above shows that relative humidity is a very influential variable on the occurrence of forest fires based on the K-means method, whereas with the FCM method, by adding precipitation variables, the correlation value with the burned area variable is better. The results of this study are in line with previous studies. In Wu (2018), it is concluded that wind speed and relative humidity are the dominant weather parameters in measuring the severity of burns. The size of the burned area is negatively correlated with relative humidity; complexity and aggregation in the burned area are positively correlated with wind speed in northeast China's boreal forests (Wu, He, Fang, Liang, & Parsons, 2018). Krystyna (2019) conducted a study that focused on the information of low humidity in the selected area to determine the risk of fire in the forests of Central Poland. The study shows that the lack of relative humidity is a conducive condition for forest fires. Low air humidity also has a large correlation with forest fires in April -August (Konca-Kędzierska & Pianko-Kluczyńska, 2019). The result in section 3.1 shows that large scale fires in cluster 3 and 4 occurred in several areas in central Kalimantan, especially in September. According to data from the Ministry of Environment and Forestry, the area in central Kalimantan or southern Borneo is covered by 80% of forest and 20% of peatland areas. When the amount of rainfall is lower than usual, climate conditions become drier, making peatlands drier, therefore maintaining conditions where ignition can easily occur and make them prone to fire (Erianto I Putra, Vetrita, & Graham, 2016;Erianto Indra Putra & Hayasaka, 2011). A low level of relative humidity combined with a reduced amount of rainfall makes the peatland area flammable. Usup et al. (2004) found that during the dry season, degraded peatlands experienced a decrease in surface water, making them more flammable. Shrubs in degraded areas can easily dry out and become flammable materials. The presence of fires combined with dry conditions, especially in combustible materials can cause forest fires difficult to extinguish and can last longer (Usup, Hashimoto, Takahashi, & Hayasaka, 2004). Syaufina & Sukmana (2008) stated that precipitation is one of the climate elements that has a high correlation with the occurrence of forest fires. This contradicts our finding that a single variable of precipitation (PCP) has a relatively low correlation compared to the other variables in both methods. Nevertheless, this paper agrees closely with the results of previous studies. Aflahah et al. (2018) andPratamasari et al. (2020) show a low correlation between precipitation and hotspot data. When the precipitation level is low, the visibility is low, the temperature is high, the number of hotspots increases so that the potential for forest and land fires is greater (Erianto Indra Putra, Hayasaka, Takahashi, & Usup, 2008).   Figure 4. Cluster of relative humidity using the K-means method, relative humidity, and precipitation -relative humidity using the FCM method (top to bottom) in August to November (from left to right). The colors represent cluster divisions: cluster 1 (blue), 2 (light blue), 3 (yellow), and 4 (red) The upper plots of Figure 4 show the cluster of the relative humidity in August-November using the K-means clustering method, while the lower plots show the cluster of the relative humidity and precipitation using the FCM method. In each month, both methods give similar cluster areas. The cluster 1 to 4 sequentially located from Southern to Northern Borneo. Compared with Figure 4, the distribution of these cluster locations has a slightly opposite pattern with the burned area cluster. The largest fire illustrated by cluster 4 (red) in Figure 4 is correlated with cluster 1 (blue) in Figure 4. Cluster 1 in Figure 4 is centered in (92.6006; 90.0252; 89.7568; 93.2711) for relative humidity cluster and (0.0432; 0.0091; 0.0616; 0.2609) for precipitation cluster using the FCM method, and in (92.5789; 90.1568; 89.9090; 93.3438) for relative humidity cluster using the K-means method. This is the lowest level of the climate variables in the selected months. This can be interpreted that the low level of precipitation and relative humidity is correlated with the vast fire area. It means the low level of precipitation and relative humidity can increase the risk of forest fires. For data that does not have the ground truth labels as a reference, the Silhouette index can be used to evaluate the model. In table 4, the results for the K-means and FCM methods have values ranging from 0.39 to 0.69 for the four months specified. This shows that all climate variations cluster models have good dense and well separated clusters. In the K-means and FCM methods, a single wind variable is the best-defined clusters with a Silhouette index above 0.6 in August until November. While the climate variables with the best correlation to burned area (see the grey shaded in Table 4), namely RH for the K-means method and PCP-RH for the FCM method has a Silhouette value between 0.5 -0.6. This value indicates that the two methods with the variables that have the best correlation also have a good evaluation value, although not the best.

Conclusions
This paper studied the correlation between the occurrence of forest fires and climate variables over Borneo using clustering methods, namely K-means and FCM clustering methods. A single or multiple combination of the climate variables are grouped into four cluster areas as well as the burned area. The K-means and FCM clustering methods show that the relative humidity contributes most to the burned area. In the K-mean clustering method, a single climate variable (relative humidity) has shown a strong negative correlation with the burned area in most of the dry season. In FCM clustering methods, the combination of the relative humidity and precipitation gives the largest negative correlation to the burned area. Different from other months, both methods show that the temperature is more likely to affect fires in November. The clustering results are evaluated using the Silhouette index. Based on the Silhouette index value, all the clustered climate variables are in a good position, dense and well separated.