Comparison of Elliptic Envelope Method and Isolation Forest Method on Imbalance Dataset
DOI:
https://doi.org/10.20956/jmsk.v17i1.10899Keywords:
Data mining, Imbalance Datasets, Classification, Elliptic Envelope, Isolation ForestAbstract
The problem of unbalanced data is important in the field of Data Mining. Dataset with unbalanced classes is a dataset whose frequency of occurrence of certain classes is very much different from other classes. This imbalance problem will bias the classifier's performance. Many researchers have examined both the development of algorithms and modifications to the preprocessing stage to overcome this problem. This study discusses the comparison of One Class Classification algorithms, namely Elliptic Envelope and Isolation Forest on unbalanced data. From this study, the Elliptic Envelope Method showed better results compared to the Isolation Forest method with 80.28% recall testing and 80.28% precision while Isolation Forest showed 46.95% recall results and 46.95% precision.
References
Burnaev, E., Erofeev, P. and Papanov, A., 2015. Influence of resampling on accuracy of imbalanced classification. Eighth International Conference on Machine Vision (ICMV 2015), 9875(December), p.987521.
Burnaev, E. and Smolyakov, D., 2016. One-Class SVM with Privileged Information and Its Application to Malware Detection. IEEE International Conference on Data Mining Workshops, ICDMW, 0, pp.273–280.
Kang, S. and Ramamohanarao, K., 2014. A robust classifier for imbalanced datasets. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8443 LNAI(PART 1), pp.212–223.
Moya, M.M. and Hush, D.R., 1996. Network constraints and multi-objective optimization for one-class classification. Neural Networks, 9(3), pp.463–474.
Noumir, Z., Honeine, P. and Richard, C., 2012. On simple one-class classification methods. IEEE International Symposium on Information Theory - Proceedings, (October), pp.2022–2026.
Rousseeuw, P. and Driessen, K., 1999. A Fast Algorithm for the Minimum Covariance. Technometrics, 41(3), pp.212–223.
Tony Liu, F., Ming Ting, K. and Zhou, Z.-H., 2008. Isolation Forest ICDM08. Icdm.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Jurnal Matematika, Statistika dan Komputasi is an Open Access journal, all articles are distributed under the terms of the Creative Commons Attribution License, allowing third parties to copy and redistribute the material in any medium or format, transform, and build upon the material, provided the original work is properly cited and states its license. This license allows authors and readers to use all articles, data sets, graphics and appendices in data mining applications, search engines, web sites, blogs and other platforms by providing appropriate reference.