Introduction to Hawkular Data Mining Module

A blog post by Pavol Loffay

I would like to introduce a module which will provide functionality for predicting alerts and also showing forecasts of metrics in Hawkular UI.

Architecture

Let’s start with the architecture how it fits to Hawkular. This module isn’t intended to work in standalone fashion. Used algorithm inside the engine will eventually change but architecture will remain as is described in the following diagram. On application startup, or if forecast for given metrics gets enabled, predictive model is initialized with data from metrics module. If there is a new metric data available on the bus, it recalculates the model and provide users with better predictions. For alert prediction it is crucial to know the threshold value. Predicted value is compared to the threshold and if necessary, an alert event is posted on the bus. Predicted values for arbitrary timestamps can be queried through REST API.

Figure 1. Architecture

This module is not yet included in the main Hawkular distribution; therefore has to be started from separate application server. To run this application, execute maven with dev profile. It produces configured wildfly in target directory of hawkular-datamaning-dist module. Later it will be possible to deploy it as standalone web application in the Hawkular server.

Assuming the Hawkular server is running:

mvn clean install -Pdev && ./hawkular-datamaning-dist/target/wildfly-*/bin/standalone.sh

Algorithm

Metric data are basically streams of data coming from feeds therefore the algorithm has to recalculate weights of predictive model for each new incoming data in order to get the most accurate results (online learning). Apache Spark was chosen to provide streaming and distributed data processing engine. At the moment only local mode of Apache Spark is used.

In the following chart regression line of 'Heap Used' metrics is showed. Currently, the line is drawn on top of the historical data, the plan is to have it on the right to denote the future predictions.

Figure 2. Yellow line is showing the result of linear regression of 'Heap Used' metrics.

So far I have been using linear regression with stochastic gradient descent. However this model doesn’t fulfill the requirements to get accurate results and predict seasonality. It was a good start to establish the architecture. Further I’m going to continue with more sophisticated methods like ARIMA (Autoregressive integrated moving average) and investigate possibility of using neural networks.

Goals

Provide an alert prediction for given set of metrics. Users should be able to optionally enable a prediction.
Show forecast of near future in Hawkular UI.

Next steps

Customize charts for forecasting HAWKULAR-738
Seasonal ARIMA model
Forecasting with Multi-Layer neural networks

Introduction to Hawkular Data Mining Module

Architecture

Algorithm

Goals

Next steps

Links