Forecasting Using K-means Clustering and RNN Methods with PCA Feature Selection

Authors

  • Ferna Marestiani Ahmad Dahlan University, Yogyakarta, Indonesia
  • Sugiyarto Surono Ahmad Dahlan University, Yogyakarta, Indonesia

Keywords:

PCA, K-means Clustering, RNN, BPTT, SGD

Abstract

Artificial Neural Networks is a computing system that is inspired by how the nervous system works in humans and continues to grow rapidly until now. Just like the nervous system in humans, artificial neural networks work through the process of studying existing data to formulate new data outputs. An artificial neural network using the Recurrent Neural Network (RNN) method is one of the popular models used today, especially in forecasting cases. In simple terms, the forecasting flow using the RNN method begins by dividing the test data and training data, the forward calculation process, the backward calculation process, the optimization calculation, and the evaluation calculation of the forecasting model. The main obstacle of the RNN method is the presence of a vanishing gradient which can cause poor forecasting results. In this study, the authors propose a Principal Component Analysis (PCA) dimension reduction method to obtain the most influential variables and become inputs for the prediction model that is built to minimize existing errors. The author also uses the K-means clustering method to divide the data with similar trend variations. To increase the clustering effect, the researcher used similarity calculation based on Euclidean distance. So that in an effort to build optimal prediction results, first time series data with the most influential variables will be selected using the PCA method. Furthermore, the data are grouped using the K-means method and will be included in the prediction model that is built. In the RNN prediction model, the data will be trained using the Backpropagation Through Time (BPTT) method and the optimization method used is Stochastic Gradient Descent (SGD). Forecasting with the RNN method with PCA produces an accuracy of 93%, while forecasting using the RNN method without PCA produces an accuracy of 82%. The experimental results show that the RNN method with PCA achieves higher predictive accuracy and flexibility than RNN without PCA.

Published

2022-06-14

How to Cite

Marestiani, F., & Surono, S. (2022). Forecasting Using K-means Clustering and RNN Methods with PCA Feature Selection. Journal of Data Science, 2022. Retrieved from https://iuojs.intimal.edu.my/index.php/jods/article/view/43