Machine Learning-Based Univariate Time Series Imputation Method for Estimating Missing Values in Non-Stationary Data

Authors

  • Dini Ramadhani Pogram Studi Statistika dan Sains Data, Fakultas Matematika dan Ilmu Pengetahuan Alam, Institut Pertanian Bogor, Indonesia
  • Agus Mohamad Soleh Pogram Studi Statistika dan Sains Data, Fakultas Matematika dan Ilmu Pengetahuan Alam, Institut Pertanian Bogor, Indonesia
  • Erfiani Erfiani Pogram Studi Statistika dan Sains Data, Fakultas Matematika dan Ilmu Pengetahuan Alam, Institut Pertanian Bogor, Indonesia

DOI:

https://doi.org/10.20956/j.v21i1.36468

Keywords:

Imputation, Non-Stationary Data, Machine Learning, Missing Values

Abstract

Handling missing values in time series data is crucial because they can disrupt data analysis and interpretation. Sequentially missing values in time series often pose a more complex challenge compared to randomly missing values. One of the promising recent methods is Machine Learning-Based Univariate Time Series Imputation (MLBUI), although it is still not widely used and its accessibility is limited. MLBUI employs Random Forest Regression (RFR) and Support Vector Regression (SVR) algorithms. This study evaluates the performance of MLBUI in addressing missing data scenarios in non-stationary univariate time series data. The data used in this research is the average temperature data from Bogor Regency. The missing data scenarios considered include rates of 6%, 10%, and 14%. Besides MLBUI, five other comparison methods are used: Kalman StructTS, Kalman Auto-ARIMA, Spline Interpolation, Stine Interpolation, and Moving Average. The results show that MLBUI performs poorly for non-stationary data, although the obtained Mean Absolute Percentage Error (MAPE) is below 10%.

Downloads

Download data is not yet available.

References

Arai, K., Kapoor, S. & Bhatia, R., 2020. Advances in Intelligent Systems and Computing. In Advances in Intelligent Systems and Computing: Vol. 1130 AISC, Vol. 1130. https://doi.org/10.1007/978-3-030-39442-4_18.

Bartlett, J. W., Carpenter, J. R., Tilling, K. & Vansteelandt, S., 2014. Improving upon the Efficiency of Complete Case Analysis when Covariates are MNAR. Biostatistics, Vol. 15, No. 4, 719–730. https://doi.org/10.1093/biostatistics/kxu023.

Denhard, A., Bandyopadhyay, S., Habte, A. & Sengupta, M., 2021. A Comparison of Time Series Gap-Filling Methods to Impute Solar Radiation Data. Proceedings - ISES Solar World Congress 2021, Vol. 2021, 1049–1057. https://doi.org/10.18086/swc.2021.38.03.

Galimard, J. E., Chevret, S., Curis, E, & Resche-Rigon, M., 2018. Heckman Imputation Models for Binary or Continuous MNAR Outcomes and MAR Predictors. BMC Medical Research Methodology, Vol. 18, No. 1, 1–13. https://doi.org/10.1186/s12874-018-0547-1.

Keller, A. C. & Evans, J. M., 2019. Application of Random Forest Regression to the Calculation of Gas-Phase Chemistry within the GEOS-Chem Chemistry Model v10. Geoscientific Model Development, Vol. 12, No. 3, 1209–1225. https://doi.org/10.5194/gmd-12-1209-2019.

Lee, K. J., Carlin, J. B., Simpson, J. A. & Moreno-Betancur, M., 2023. Assumptions and Analysis Planning in Studies with Missing Data in Multiple Variables: Moving beyond the MCAR/MAR/MNAR Classification. International Journal of Epidemiology, Vol. 52, No. 4, 1268–1275. https://doi.org/10.1093/ije/dyad008.

Mir, A. A., Kearfott, K. J., Çelebi, F. V. & Rafique, M., 2022. Imputation by Feature Importance (IBFI): A Methodology to Envelop Machine Learning Method for Imputing Missing Patterns in Time Series Data. PloS one, Vol. 17, No 1, e0262131.

Mohamad, N. B., Lim, B. H. & Lai, A. C., 2021. Imputation of Missing Values for Solar Irradiance Data under Different Weathers using Univariate Methods. IOP Conference Series: Earth and Environmental Science, Vol. 721, No. 1. https://doi.org/10.1088/1755-1315/721/1/012004.

Moritz, S. & Bartz-beielstein, T., 2017. imputeTS : Time Series Missing Value Imputation in R. The R Journal, Vol. 9, No. 1, 1–12. https://doi.org/10.32614/RJ-2017-009.

Newman, D. A., 2014. Missing Data: Five Practical Guidelines. Organizational Research Methods, Vol. 17. No. 4, 372–411. https://doi.org/10.1177/1094428114548590.

Peugh, J. L., Toland, M. D. & Harrison, H., 2023. A Tutorial for Handling Suspected Missing Not at Random Data in Longitudinal Clinical Trials. The Quantitative Methods for Psychology, Vol. 19, No. 4, 347–367. https://doi.org/10.20982/tqmp.19.4.p347.

Phan, T. T. H., 2020. Machine Learning for Univariate Time Series Imputation. 2020 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), 1–6. https://doi.org/10.1109/MAPR49794.2020.9237768.

Phan, T. T. H., Caillault, É. P., Lefebvre, A. & Bigand, A., 2020. Dynamic Time Warping-based Imputation for Univariate Time Series Data. Pattern Recognition Letters, Vol. 139, 139–147. https://doi.org/10.1016/j.patrec.2017.08.019.

Riyani, D., Prastyo, D. D. & Suhartono., 2019. Input Selection in Support Vector Regression for Univariate Time Series Forecasting. AIP Conference Proceedings, Vol. 2194, No. 1, 020105. https://doi.org/10.1063/1.5139837.

Welch, G. & Bishop, G., 2006. An Introduction to the Kalman Filter. Asian J Control, 2.

Wong, W. M., Lee, M. Y., Azman, A. S. & Rose, L. A. F., 2021. Development of Short-term Flood Forecast using ARIMA. International Journal of Mathematical Models and Methods in Applied Sciences, Vol. 15, 68–75. https://doi.org/10.46300/9101.2021.15.10.

Wongoutong, C., 2021. Imputation Methods in Time Series with a Trend and a Consecutive Missing Value Pattern. Thailand Statistician, Vol. 19, No. 4, 866–879.

Downloads

Published

2024-09-15

How to Cite

Ramadhani, D. ., Soleh, A. M. ., & Erfiani, E. (2024). Machine Learning-Based Univariate Time Series Imputation Method for Estimating Missing Values in Non-Stationary Data. Jurnal Matematika, Statistika Dan Komputasi, 21(1), 307-320. https://doi.org/10.20956/j.v21i1.36468

Issue

Section

Research Articles