Machine Learning-Based Univariate Time Series Imputation Method for Estimating Missing Values in Non-Stationary Data
DOI:
https://doi.org/10.20956/j.v21i1.36468Keywords:
Imputation, Non-Stationary Data, Machine Learning, Missing ValuesAbstract
Handling missing values in time series data is crucial because they can disrupt data analysis and interpretation. Sequentially missing values in time series often pose a more complex challenge compared to randomly missing values. One of the promising recent methods is Machine Learning-Based Univariate Time Series Imputation (MLBUI), although it is still not widely used and its accessibility is limited. MLBUI employs Random Forest Regression (RFR) and Support Vector Regression (SVR) algorithms. This study evaluates the performance of MLBUI in addressing missing data scenarios in non-stationary univariate time series data. The data used in this research is the average temperature data from Bogor Regency. The missing data scenarios considered include rates of 6%, 10%, and 14%. Besides MLBUI, five other comparison methods are used: Kalman StructTS, Kalman Auto-ARIMA, Spline Interpolation, Stine Interpolation, and Moving Average. The results show that MLBUI performs poorly for non-stationary data, although the obtained Mean Absolute Percentage Error (MAPE) is below 10%.
References
Arai, K., Kapoor, S. & Bhatia, R., 2020. Advances in Intelligent Systems and Computing. In Advances in Intelligent Systems and Computing: Vol. 1130 AISC, Vol. 1130. https://doi.org/10.1007/978-3-030-39442-4_18.
Bartlett, J. W., Carpenter, J. R., Tilling, K. & Vansteelandt, S., 2014. Improving upon the Efficiency of Complete Case Analysis when Covariates are MNAR. Biostatistics, Vol. 15, No. 4, 719–730. https://doi.org/10.1093/biostatistics/kxu023.
Denhard, A., Bandyopadhyay, S., Habte, A. & Sengupta, M., 2021. A Comparison of Time Series Gap-Filling Methods to Impute Solar Radiation Data. Proceedings - ISES Solar World Congress 2021, Vol. 2021, 1049–1057. https://doi.org/10.18086/swc.2021.38.03.
Galimard, J. E., Chevret, S., Curis, E, & Resche-Rigon, M., 2018. Heckman Imputation Models for Binary or Continuous MNAR Outcomes and MAR Predictors. BMC Medical Research Methodology, Vol. 18, No. 1, 1–13. https://doi.org/10.1186/s12874-018-0547-1.
Keller, A. C. & Evans, J. M., 2019. Application of Random Forest Regression to the Calculation of Gas-Phase Chemistry within the GEOS-Chem Chemistry Model v10. Geoscientific Model Development, Vol. 12, No. 3, 1209–1225. https://doi.org/10.5194/gmd-12-1209-2019.
Lee, K. J., Carlin, J. B., Simpson, J. A. & Moreno-Betancur, M., 2023. Assumptions and Analysis Planning in Studies with Missing Data in Multiple Variables: Moving beyond the MCAR/MAR/MNAR Classification. International Journal of Epidemiology, Vol. 52, No. 4, 1268–1275. https://doi.org/10.1093/ije/dyad008.
Mir, A. A., Kearfott, K. J., Çelebi, F. V. & Rafique, M., 2022. Imputation by Feature Importance (IBFI): A Methodology to Envelop Machine Learning Method for Imputing Missing Patterns in Time Series Data. PloS one, Vol. 17, No 1, e0262131.
Mohamad, N. B., Lim, B. H. & Lai, A. C., 2021. Imputation of Missing Values for Solar Irradiance Data under Different Weathers using Univariate Methods. IOP Conference Series: Earth and Environmental Science, Vol. 721, No. 1. https://doi.org/10.1088/1755-1315/721/1/012004.
Moritz, S. & Bartz-beielstein, T., 2017. imputeTS : Time Series Missing Value Imputation in R. The R Journal, Vol. 9, No. 1, 1–12. https://doi.org/10.32614/RJ-2017-009.
Newman, D. A., 2014. Missing Data: Five Practical Guidelines. Organizational Research Methods, Vol. 17. No. 4, 372–411. https://doi.org/10.1177/1094428114548590.
Peugh, J. L., Toland, M. D. & Harrison, H., 2023. A Tutorial for Handling Suspected Missing Not at Random Data in Longitudinal Clinical Trials. The Quantitative Methods for Psychology, Vol. 19, No. 4, 347–367. https://doi.org/10.20982/tqmp.19.4.p347.
Phan, T. T. H., 2020. Machine Learning for Univariate Time Series Imputation. 2020 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), 1–6. https://doi.org/10.1109/MAPR49794.2020.9237768.
Phan, T. T. H., Caillault, É. P., Lefebvre, A. & Bigand, A., 2020. Dynamic Time Warping-based Imputation for Univariate Time Series Data. Pattern Recognition Letters, Vol. 139, 139–147. https://doi.org/10.1016/j.patrec.2017.08.019.
Riyani, D., Prastyo, D. D. & Suhartono., 2019. Input Selection in Support Vector Regression for Univariate Time Series Forecasting. AIP Conference Proceedings, Vol. 2194, No. 1, 020105. https://doi.org/10.1063/1.5139837.
Welch, G. & Bishop, G., 2006. An Introduction to the Kalman Filter. Asian J Control, 2.
Wong, W. M., Lee, M. Y., Azman, A. S. & Rose, L. A. F., 2021. Development of Short-term Flood Forecast using ARIMA. International Journal of Mathematical Models and Methods in Applied Sciences, Vol. 15, 68–75. https://doi.org/10.46300/9101.2021.15.10.
Wongoutong, C., 2021. Imputation Methods in Time Series with a Trend and a Consecutive Missing Value Pattern. Thailand Statistician, Vol. 19, No. 4, 866–879.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Jurnal Matematika, Statistika dan Komputasi
This work is licensed under a Creative Commons Attribution 4.0 International License.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Jurnal Matematika, Statistika dan Komputasi is an Open Access journal, all articles are distributed under the terms of the Creative Commons Attribution License, allowing third parties to copy and redistribute the material in any medium or format, transform, and build upon the material, provided the original work is properly cited and states its license. This license allows authors and readers to use all articles, data sets, graphics and appendices in data mining applications, search engines, web sites, blogs and other platforms by providing appropriate reference.