Data Mining Documentation

datamining

Introduction

Data Mining currently offers time series prediction engine. This engine can be used for alert prediction or predictive charts in a user interface. Project is decomposed into several modules for ability of using only time series models or even web application without Hawkular integration code.

The most important Hawkular Data Mining modules:

Table 1. Data Mining main modules
Artifact Description

hawkular-datamining-forecast

Lightweight library for time series modeling and forecasting.

hawkular-datamining-rest

Web archive for standalone usage without Hawkular.

hawkular-datamining-dist

Web archive with Hawkular integration code designed to be deployed in Hawkular.

In standalone distribution, metric data has to be pushed to Data Mining, however easy customization can be made to use any metrics storage.

Forecast module

The forecast artifact is lightweight library for time series modelling and forecasting. It can be easy used in any java project. Following time series models and statistical functions are implemented:

Time series models

  • Simple exponential smoothing

  • Double exponential smoothing (Holt’s linear trend)

  • Seasonal triple exponential smoothing (Holt Winters)

  • Simple moving average

  • Weighted moving average

All variants of exponential smoothing contains optimizer which finds the best smoothing parameters for given training data set. Optimizers minimizes mean squared error of one step ahead prediction using non linear optimization algorithm (Nelder-Mead simplex).

AutomaticForecaster can be used for the best model selecting. It selects the best model based on Akaike information criterion (AIC and AICc with correction) or Bayesian information criterion (BIC). If the time series changes during time, AutomaticForecaster is able to select different model from previously selected, therefore it can be used on arbitrary time series stream data which exhibits concept drift over time.

Time series manipulation & Statistics

  • Augmented Dickey-Fuller test

  • Autocorrelation function (ACF)

  • Time series decomposition

  • Time series lagging

  • Time series differencing

  • Automatic period identification based on ACF

Data Mining in Hawkular

When Data Mining is deployed in Hawkular, predictions can be enabled by creating relationships in Inventory:

  • from tenant to tenant for forecasting all metrics of given tenant

  • from tenant to metric type for forecasting all metrics of given type

  • from tenant to metric for forecasting given metric

Following diagram depicts the interaction with other modules. Numbers denotes the order of metric data flow from agent to Data Mining and Metrics module.

Architecture
Figure 1. Data Mining interaction with other Hawkular modules

Prerequisites

Hawkular Data Mining Git repository is hosted on GitHub. To build the repository Maven and Java 8 are needed.

redhatlogo-white

© 2016 | Hawkular is released under Apache License v2.0