Metric data are basically streams of data coming from feeds therefore the algorithm has to recalculate weights of
predictive model for each new incoming data in order to get the most accurate results (online learning).
Apache Spark was chosen to provide streaming and distributed data processing engine. At the moment only local mode of
Apache Spark is used.
In the following chart regression line of 'Heap Used' metrics is showed. Currently, the line is drawn on
top of the historical data, the plan is to have it on the right to denote the future predictions.
Figure 2. Yellow line is showing the result of linear regression of 'Heap Used' metrics.
So far I have been using linear regression with stochastic gradient descent. However this model doesn’t fulfill
the requirements to get accurate results and predict seasonality. It was a good start to establish the architecture.
Further I’m going to
continue with more sophisticated methods like ARIMA (Autoregressive integrated moving average) and investigate
possibility of using neural networks.