Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017)

Taly Purwa

doi:10.20956/jmsk.v16i1.6494

Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017)

Authors

Taly Purwa Badan Pusat Statistik (BPS) Provinsi Bali

DOI:

https://doi.org/10.20956/jmsk.v16i1.6494

Keywords:

Kemiskinan, Imbalanced data, Regresi Logistik, Random Forest, Stratified 5-fold CV, Undersampling, Oversampling, Combine sampling

Abstract

Penelitian ini bertujuan untuk mendapatkan model terbaik untuk klasifikasi data imbalanced, yaitu rumah tangga sampel Susenas Maret 2017 di Kabupaten Karangasem, ke dalam kategori miskin atau tidak. Metode yang digunakan adalah Regresi Logistik dan Random Forest dimana masing-masing diterapkan skema cross validation (CV), yaitu stratified 5-fold CV, skema under sampling, oversampling dan combine sampling untuk mengatasi masalah data imbalanced serta proses feature selection. Hasil penelitian menunjukkan bahwa penerapan skema under sampling, oversampling dan combine sampling pada model Regresi Logistik memberikan efek meningkatnya rata-rata nilai sensitivity dan turunnya rata-rata nilai akurasi dan specificity. Sedangkan pada model Random Forest, efek tersebut hanya terlihat dari hasil skema under sampling saja. Proses feature selection dapat menurunkan varian nilai akurasi, specificity, sensitivity dan AUC pada model Regresi Logistik dan Random Forest hanya pada skema tertentu. Model terbaik secara keseluruhan adalah model model Regresi Logistik dengan skema combine sampling dan tanpa proses feature selection dengan rata-rata nilai akurasi, specificity, sensitivity dan AUC masing-masing sebesar 78,13%, 79,16%, 64,44% dan 77,77%.

References

BPS. 2016. Perhitungan dan Analisis Kemiskinan Makro Indonesia 2016. Jakarta: Badan Pusat Statistik.

Breiman, L. 2001. Random Forest. Machine Learning, Vol. 45, No. 1, hal. 5-32.

Chawla, N.V., Bowyer, K.W., Hall, L.O. & Kegelmeyer, W.P., 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Inteligence Research, Vol. 16, hal. 321-357.

Hosmer, D.W. dan Lemeshow, S. 2000. Applied Logistic Regression: second edition. New Jersey : John Wiley & Sons, Inc.

King, G dan Zeng, L. 2001. Logistic Regression in Rare Events Data. Political Analysis, Vol. 9, No. 2, hal. 137-163.

Lunardon, N., Menardi, G. dan Torelli, N. 2014. ROSE: A Package for Binary Imbalanced Learning. The R Journal, Vol. 6, No. 1, hal. 79-89.

Maalouf, M. dan Siddiqi, M., 2014. Weighted Logistic Regression for LargeScale Imbalanced and Rare Events Data, Journal of Knowledge Based Systems, Vol. 59, hal. 142-148.

Menardi, G. dan Torelli, N. 2012. Training and Assessing Classification Rules with Imbalanced Data. Data Mining Knowledge Discovery, Vol. 28, No. 1, hal. 92-122.

Tomek, I. 1997. Two Modifications of CNN. IEEE Transactions of Systems Man and Communications, Vol 6, No. 11, hal. 769-772.

Torgo, L. 2011. Data Mining with R: Learning with Case Studies. Boca Raton : Chapman & Hall/CRC press.

Downloads

Published

2019-06-27

How to Cite

Purwa, T. (2019). Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017). Jurnal Matematika, Statistika Dan Komputasi, 16(1), 58–73. https://doi.org/10.20956/jmsk.v16i1.6494

Download Citation

Issue

Vol. 16 No. 1 (2019): JMSK, July, 2019

Section

Research Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Jurnal Matematika, Statistika dan Komputasi is an Open Access journal, all articles are distributed under the terms of the Creative Commons Attribution License, allowing third parties to copy and redistribute the material in any medium or format, transform, and build upon the material, provided the original work is properly cited and states its license. This license allows authors and readers to use all articles, data sets, graphics and appendices in data mining applications, search engines, web sites, blogs and other platforms by providing appropriate reference.

Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017)

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

MENUSIDEBAR

visitors

sinta

Indexing