Hawkular - Data Mining Documentation

Introduction
Forecast module
- Time series models
- Time series manipulation & Statistics
Data Mining in Hawkular
Prerequisites
Useful Links

Introduction

Data Mining currently offers time series prediction engine. This engine can be used for alert prediction or predictive charts in a user interface. Project is decomposed into several modules for ability of using only time series models or even web application without Hawkular integration code.

The most important Hawkular Data Mining modules:

Table 1. Data Mining main modules
Artifact	Description
hawkular-datamining-forecast	Lightweight library for time series modeling and forecasting.
hawkular-datamining-rest	Web archive for standalone usage without Hawkular.
hawkular-datamining-dist	Web archive with Hawkular integration code designed to be deployed in Hawkular.

In standalone distribution, metric data has to be pushed to Data Mining, however easy customization can be made to use any metrics storage.

Forecast module

The forecast artifact is lightweight library for time series modelling and forecasting. It can be easy used in any java project. Following time series models and statistical functions are implemented:

Time series models

Simple exponential smoothing
Double exponential smoothing (Holt’s linear trend)
Seasonal triple exponential smoothing (Holt Winters)
Simple moving average
Weighted moving average

All variants of exponential smoothing contains optimizer which finds the best smoothing parameters for given training data set. Optimizers minimizes mean squared error of one step ahead prediction using non linear optimization algorithm (Nelder-Mead simplex).

AutomaticForecaster can be used for the best model selecting. It selects the best model based on Akaike information criterion (AIC and AICc with correction) or Bayesian information criterion (BIC). If the time series changes during time, AutomaticForecaster is able to select different model from previously selected, therefore it can be used on arbitrary time series stream data which exhibits concept drift over time.

Time series manipulation & Statistics

Augmented Dickey-Fuller test
Autocorrelation function (ACF)
Time series decomposition
Time series lagging
Time series differencing
Automatic period identification based on ACF

Data Mining in Hawkular

When Data Mining is deployed in Hawkular, predictions can be enabled by creating relationships in Inventory:

from tenant to tenant for forecasting all metrics of given tenant
from tenant to metric type for forecasting all metrics of given type
from tenant to metric for forecasting given metric

Following diagram depicts the interaction with other modules. Numbers denotes the order of metric data flow from agent to Data Mining and Metrics module.

Figure 1. Data Mining interaction with other Hawkular modules

Prerequisites

Hawkular Data Mining Git repository is hosted on GitHub. To build the repository Maven and Java 8 are needed.

Useful Links

Source code Github
Data Mining integrated into Hawkular
REST API documentation page