Red Hat

Hawkular Blog

Monitoring Microservices on OpenShift with HOSA

17 January 2017, by Heiko W. Rupp

Monitoring Microservices on OpenShift with the Hawkular OpenShift Agent

Monitoring Microservices on orchestrated platforms like OpenShift is a very different endeavor than the classical monitoring of monoliths on their dedicated servers. The biggest two differences are that the services can just be deployed by the schedule on any available server node and that it is possible to have many instances of a single service run in parallel

The Hawkular project is now introducing the Hawkular OpenShift Agent (HOSA), which is deployed in OpenShift as infrastructure level component. Hosa runs on each node to monitor pods for the node and sends the retrieved metrics to Hawkular-Metrics. Hosa may eventually replace Heapster in the longer run.

The monitoring scenario

The following drawing shows the scenario that I am going to describe below

Scenario overview
Figure 1. The Scenario

The numbers in the round blue circles will be referenced below as (n).

As example we are taking Microservices created by the Obsidian Toaster generators and deploy them with the help of the Fabric8 Maven Plugin into OpenShift.

Note
It is not necessary to retrieve the source code from Obsidian Toaster. Any source that can be deployed to OpenShift via the Fabric8 Maven plugin will do.

After getting the source code from the generator, run the respective mvn clean package goals as described in the README of the source (1). If that works well, you can then deploy the result into OpenShift via mvn fabric8:deploy -Popenshift (2). This will create a so called Source to Image (S2I) build in OpenShift which takes the provided artifacts and config files, creates images and deploys them in a pod with associated service etc (3).

The latter is driven by some internal settings of the maven plugin, which also merges in files from src/main/fabric8/. More on that below.

For each JVM that the Fabric8 plugin deploys, it also puts a Jolokia agent in the VM, as you can see in (3). This agent exposes internal JMX metrics via the Jolokia-REST protocol, on a pre-defined port with a default user name and a random password. The data of this pod is also exposed via the API-proxy, so that you can click on the Open Java Console link in the next figure to get to a JMX browser.

Pod in Openshift
Figure 2. A pod in OpenShift

The Hawkular OpenShift Agent

The Hawkular OpenShift Agent (Hosa) runs as part of the OpenShift infrastructure inside the openshift-infra namespace on a per node basis, monitoring eligible pods on that node. The GitHub page has pretty good information on how to compile and deploy it (there are also pre-created Docker images available).

In order for Hosa to know which pods to monitor and what metrics to collect, it looks at pods and checks if they have a ConfigMap with a name of hawkular-openshift-agent declared.

Config Map for Hosa, (4)
kind: ConfigMap
apiVersion: v1
metadata:
  name: obs-java-hosa-config.yml  (1)
  namespace: myproject
data:
  hawkular-openshift-agent: | (2)
    endpoints:
    - type: jolokia (3)
      collection_interval: 60s (4)
      protocol: "https"
      tls:
        skip_certificate_validation: true (5)
      port: 8778
      credentials:
        username: jolokia
        password: secret:hosa-secret/password  (6)
      path: /jolokia/
      tags:
        name: ${POD:label[project]} (7)
      metrics: (8)
      - name: java.lang:type=Threading#ThreadCount
        id: the_thread_count
        type: gauge
  1. Name of the map

  2. Configuration for the agent to monitor matching pods

  3. This is a Jolokia-kind of endpoint

  4. Collect the metrics every 60s. This is a Golang Duration and is equal to 1m

  5. The agent checks for valid certificates in case of https. We skip this check

  6. This is the password used to talk to Jolokia, which we obtain from a secret — more below.

  7. Each metric is tagged with a label name=<project name>

  8. Definition of the metrics to be collected

You can save the above ConfigMap into a file and deploy it into Openshift via oc create -f <configmap.yml>.

Note
Hosa can also grab data from Prometheus style endpoints. In the future we may switch to use this protocol, as JMX is a very JavaVM-centric concept and Microservices may also be created in non-JVM environments like Node.js or Ruby.

Now the question is how do we tell our deployment to use this config map? In Figure 2 you see, that this gets declared as a volume. In order to do so, we need to go back to our local source code and add a new file deployment.yml into src/main/fabric8/ and re-deploy our source with mvn package fabric8:deploy -Popenshift. And as we are doing this, we also want to make sure that the password for Jolokia is not hard coded, but will be obtained from an OpenShift secret

deployment.yml
# This gets merged into the main openshift.yml's deployment config via f8 plugin
spec:
  template:
    spec:
      volumes:
        - name: hawkular-openshift-agent (1)
          configMap:
            name: obs-java-hosa-config.yml (2)
      containers:
        - env:
          - name: AB_JOLOKIA_PASSWORD_RANDOM (3)
            value: "false"
          - name: AB_JOLOKIA_PASSWORD (4)
            valueFrom:
              secretKeyRef: (5)
                name: hosa-secret
                key: password
  1. The magic name of the volume so that Hosa can find it

  2. The name of the config map to use. See (1) in Config Map for Hosa above.

  3. Tell Jolokia not to create a random password

  4. Make OpenShift set the password, which it gets from a secret

  5. The secret to query is named hosa-secret and we want the entry with the name password.

Hosa is getting noticed once you redeploy the application, and will see the volume and will try to start monitoring the pod. Which leaves us with the OpenShift secret.

Creating the secret, (5)

To create a secret that holds our password we need to do two things. First we need to encode the password in base 64 format.

Base64 encoding of the password
$ echo -n "test4hawkular" | base64
dGVzdDRoYXdrdWxhcgo==

And then we need to create a yml file for the secret.

hosa-secret.yml
apiVersion: v1
kind: Secret
metadata:
  name: hosa-secret (1)
type: Opaque
data:
  password: dGVzdDRoYXdrdWxhcg== (2)
  1. Name of the secret

  2. Key is 'password', value is password from previous step

You can deploy that secret with oc create -f hosa-secret.yml.

Display data with Grafana

Now that we have the agent collecting data and storing in Hawkular-Metrics we can look at them with the help of Grafana. Joel Takvorian has described this pretty well, so I am not going to repeat the setup in detail here.

Note
To get quickly started, you can run $ oc new-app docker.io/hawkular/hawkular-grafana-datasource. And then when the service is created, click on Add route in the OpenShift UI to expose Grafana to the outside world.

To configure the datasource in Grafana, we can now use the namespace of the project and a token

Getting host, tenant and token to configure the datasource
$ oc whoami
developer
$ oc project
Using project "myproject" on server "https://pintsize:8443". (1)
$ oc whoami -t
JhrqvcFTnEuP3XRPrLbwAAfpbZV4hYmne3-JMIXv4LQ (2)
$ oc login -u system:admin (3)
$ oc get svc hawkular-metrics -n openshift-infra (4)
NAME               CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
hawkular-metrics   172.30.236.16   <none>        443/TCP   12d (5)
# This is an alternative
$ oc get route hawkular-metrics -n openshift-infra (6)
NAME               HOST/PORT
hawkular-metrics   metrics-openshift-infra.172.31.7.9.xip.io
  1. 'myproject' will be the tenant

  2. The token for authentication

  3. OpenShift infrastructure is not visible to the developer account

  4. Get the Cluster-IP of the Hawkular-Metrics service

  5. Cluster-IP is the host part of the https url of the metrics service, which follows the pattern of https://<cluster-ip>/hawkular/metrics

  6. As alternative get the public host from the OpenShift route

We can then use this information to define our datasource in Grafana. If you want you can also make this the default by ticking the respective checkbox. Access mode needs to be proxy, as the service is not visible (under that IP) from outside of OpenShift. Instead of using the OpenShift-internal service we can also use the external IP defined by the hawkular-metrics route.

Grafana datasource setup
Figure 3. Grafana datasource setup

Using a token works well and is quickly done, but it has the caveat that tokens obtained with oc whoami -t will expire and thus should only be used to quickly test if the datasource works.

A better solution is to use a service account (7) instead of the token, which I am going to explain next.

Create a service account, (7)

Creating a Service Account is easy and can be done via oc create sa view-metrics

When you look at it with oc describe sa/view-metrics, it shows a list of tokens at the end:

$ oc describe sa/view-metrics
Name:		view-metrics
Namespace:	myproject
Labels:		<none>

Image pull secrets:	view-metrics-dockercfg-rmnee

Mountable secrets: 	view-metrics-dockercfg-rmnee
                   	view-metrics-token-vowtw

Tokens:            	view-metrics-token-t98qw (1)
                   	view-metrics-token-vowtw
  1. The token to be used in the next step

Those tokens are actually secrets, that were populated by OpenShift. Inspecting one of the tokens then reveals a token string, that we can use inside of Grafana

Warning
Only the Token, which is not the one, that is also listed as mountable secret can be used. The other one will not work and Grafana will report a "Forbidden" message when trying to save the datasource.
$ oc describe secret view-metrics-token-t98qw
Name:		view-metrics-token-vowtw
Namespace:	myproject
Annotations:    kubernetes.io/created-by=openshift.io/create-dockercfg-secrets
                kubernetes.io/service-account.name=view-metrics (1)
[...]
Data
====
namespace:	9 bytes
service-ca.crt:	2186 bytes
token:		eyJhbGciOiJS... (2)
  1. This annotation needs to be present for the secret to be usable

  2. Long token string

Using the token from the ServiceAccount
Figure 4. Using the token from the ServiceAccount

Result

So finally we can see the thread count of our Obsidian sample application:

Thread count
Figure 5. Thread count in Grafana

In the chart we can now see the thread count of our application. You can see that at around 9am we scaled the app from one to two pods.





Reporting Dropwizard metrics to Hawkular

16 January 2017, by Joel Takvorian

Dropwizard metrics (aka. Codahale aka. Yammer) is a well known, successful metrics framework that is easy to plug in a Java application. You just create a metrics registry, create metrics, feed them and voilĂ . But Dropwizard metrics doesn’t store anything by itself, it’s delegated to the so-called "reporters". So we’ve built the hawkular-dropwizard-reporter.

This article will show you:

  • how you can create custom metrics

  • how you can take advantage of existing middleware that uses Dropwizard metrics

  • how to fine-tune the Hawkular reporter to make data exploitation easy

As an example, I will use a very simple application to benchmark some data structures (caches) in real time, and output results as metrics. This is not a serious benchmark and we don’t really care about the results: our purpose is nothing more than to illustrate how we can work with Hawkular and Dropwizard.

Overview

Minimal setup

You need very few things to start using the Hawkular Dropwizard reporter. Obviously, you must have a running instance of Hawkular Metrics. If you don’t have it yet, check out the installation guide.

Then, add a maven dependency to your Java application:

    <dependency>
        <groupId>org.hawkular.metrics</groupId>
        <artifactId>hawkular-dropwizard-reporter</artifactId>
        <version>0.1.1</version>
    </dependency>

In your Java code, create a MetricRegistry and pass it to the HawkularReporter:

    MetricRegistry registry = new MetricRegistry();
    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .build();
    hawkularReporter.start(1, TimeUnit.SECONDS);

And that’s it. From now on, all metrics that are added to the registry will be reported to Hawkular Metrics. The default configuration expects to find the Hawkular server on localhost:8080, you can change it with the builder’s uri method. You can explore the builder options to configure it.

Tip
In this example, "sample-tenant" refers to the tenant used in Hawkular. You can set whatever you want instead. Check Hawkular documentation to learn more about multi-tenancy.

If, for instance, we want to connect to an Hawkular pod running in OpenShift, we would configure the reporter similar to that one:

    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .uri("https://metrics.192.168.42.63.xip.io/hawkular/metrics")
            .bearerToken("ABCDEFGHIJ1234")
            .build();

To conclude with the initial setup, note that it is often a good practice to include some host-based disambiguation string in the metric names, so that if your application is deployed on several hosts and they all communicate to the same Hawkular instance using the same tenant, they will not conflict and feed the same metrics. A convenient way to do that is to set a prefix in the reporter configuration, for instance using the host name:

    String hostname;
    try {
        hostname = InetAddress.getLocalHost().getCanonicalHostName();
    } catch (UnknownHostException e) {
        hostname = "?";
    }
    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .prefixedWith(hostname + ".")
            .build();

Dropwizard → Hawkular

As stated before, our sample app will do some (very) basic benchmarking on several caches implementations: we’ll have a Guava cache and a local EhCache.

Note
This sample app is on my GitHub: https://github.com/jotak/hawkular-dropwizard-sample

We create a very simple interface that allows us to get data from cache, count number of elements and know if the latest read was from cache or not:

interface Backend {
    Object get(String key);
    long count();
    void init(Map<String, Object> presetElements);
    boolean isLastReadFromCache();
}

You can see on GitHub the GuavaBackend and EhcacheBackend implementations, there’s nothing fancy here.

EhCache is initialized programmatically:

    CacheManager cacheManager = CacheManager.newInstance();
    Ehcache cache = cacheManager.addCacheIfAbsent("testCache");

Now, the Benchmark class. This is where we create metrics and feed them, with the pretext of a small scenario. Here’s the run method:

    private void run() {
        final DatabaseStub fakeDb = new DatabaseStub(); (1)
        final BackendMonitoring monitoring = new BackendMonitoring(registry); (2)
        Map<String, Object> presetElements = IntStream.range(0, 100000)
                .mapToObj(Integer::new)
                .collect(Collectors.toMap(i -> UUID.randomUUID().toString(), i -> i)); (3)

        (4)
        monitoring.runScenario(presetElements, new GuavaBackend(fakeDb), GuavaBackend.NAME);
        monitoring.runScenario(presetElements, new EhcacheBackend(fakeDb, ehcache), EhcacheBackend.NAME);
    }
  1. The class DatabaseStub is used to simulate a database with latency, using just a HashMap for storage (view it on GitHub).

  2. BackendMonitoring will setup monitoring for a given backend.

  3. Creates a collection of items to store in cache, and use them in the scenarios.

  4. Run the scenario for each Backend implementation (described below).

Now, BackendMonitoring.runScenario:

    void runScenario(Map<String, Object> presetElements, Backend backend, String name) {
        System.out.println("Starting scenario for " + name);
        registry.register(name + ".size", (Gauge<Long>) backend::count); (1)
        Timer readTimer = registry.timer(name + ".read"); (2)
        final Meter readCacheMeter = registry.meter(name + ".cache.read"); (3)
        final Meter readDbMeter = registry.meter(name + ".db.read"); (4)
        final Counter numberItemsRead = registry.counter(name + ".total.read.count"); (5)
        // Setup preset elements
        backend.init(presetElements);
        List<String> keys = new ArrayList<>(presetElements.keySet());
        ThreadLocalRandom rnd = ThreadLocalRandom.current();
        Stopwatch watch = Stopwatch.createStarted();
        while (watch.elapsed(TimeUnit.MINUTES) < 5) {
            int pos = rnd.nextInt(0, keys.size());
            runWithBenchmark(() -> {
                backend.get(keys.get(pos));
                if (backend.isLastReadFromCache()) {
                    readCacheMeter.mark();
                } else {
                    readDbMeter.mark();
                }
                numberItemsRead.inc();
            }, readTimer);
        }
        // Reset size gauge to 0
        backend.init(new HashMap<>());
        System.out.println("Ending scenario for " + name);
    }

And finally, BackendMonitoring.runWithBenchmark:

    private void runWithBenchmark(Runnable r, Timer readTimer) {
        final Timer.Context ctx = readTimer.time();
        try {
            r.run();
        } finally {
            ctx.stop();
        }
    }

Here we create several metrics:

  1. A Gauge that will track the number of elements in cache.

  2. A Timer metric. Each time the runWithBenchmark method is called, that timer computes the Runnable execution time.

  3. A Meter that is invoked each time data is read from cache (rather than DB).

  4. The opposite: a Meter that is invoked each time data is read from db.

  5. A Counter that tracks the total number of reads. We could actually get rid of it, because its value could be retrieved from readDbMeter.count + readCacheMeter.count (yes, a Meter includes a Counter).

You can learn more about Dropwizard metric types from its documentation.

Remember that since we associated the Hawkular reporter with the metrics registry, all metrics are automatically reported into Hawkular Metrics.

Now, let’s run the benchmark. I’m using Grafana with its Hawkular plugin to display graphs.

Custom metrics
  • Upper-left: storage size (yellow = Guava, green = EhCache)

  • Upper-right: read response time (yellow = Guava, green = EhCache)

  • Bottom-left: read cache vs DB - mean rate (orange = Guava/db, yellow = Guava/cache, blue = EhCache/db, green = EhCache/cache)

  • Bottom-right: read cache vs DB - count (orange = Guava/db, yellow = Guava/cache, blue = EhCache/db, green = EhCache/cache)

We can see the Guava cache scenario in the first 5 minutes, followed by the EhCache scenario. Note how the storage size fells abruptly at about halfway of EhCache scenario: this is probably due to a cache eviction mechanism that is present by default (given we didn’t configure the cache at all).

We can correlate that with the response time with EhCache that is not improving as fast as Guava’s as long as the cache get filled. However we can suppose it’s compensated for smaller memory footprint.

Middleware → Dropwizard → Hawkular

So, we know how to create metrics. That’s perfect to track values that are very specific to an application. But the best is that a lot of existing Java middleware already provides tons of metrics on Dropwizard, that you can integrate very easily in your application.

There is a non exhaustive list in Dropwizard documentation (here and there). It includes EhCache, Apache Http client, Jetty, etc. But they are actually many others. Some frameworks, like Vert.X may also report metrics directly to Hawkular, so you don’t even need to go through Dropwizard at all (but still, you can).

Since we’re already using EhCache in our sample app, let’s try to get EhCache middleware metrics. We need first to add a maven dependency:

    <dependency>
      <groupId>io.dropwizard.metrics</groupId>
      <artifactId>metrics-ehcache</artifactId>
      <version>3.1.2</version>
    </dependency>

When we initialize EhCache programmatically, we create an InstrumentedEhcache object, which is its Dropwizard avatar:

    private Benchmark(MetricRegistry registry, Ehcache cache) {
        this.registry = registry;
        ehcache = InstrumentedEhcache.instrument(registry, cache);
    }

And then we use this InstrumentedCache instead of the initial EhCache object in the rest of our code. That’s it. Every time something is done on EhCache, metrics will be feeded.

See for instance what we get in Grafana, when the EhcacheBackend is invoked during our scenario:

EhCache metrics

Here we track some metrics such as the gets and puts mean, the number of memory hits and misses. See the full list of available metrics.

What else could we do…​ We’re on the JVM, right? We could get monitoring data from MX Beans (such as MemoryMXBean) and create our own metrics in Dropwizard, but there’s already a module that does the job:

    <dependency>
      <groupId>io.dropwizard.metrics</groupId>
      <artifactId>metrics-jvm</artifactId>
      <version>3.1.2</version>
    </dependency>

After creating the MetricRegistry, you can add some preset JVM metric sets, such as GarbageCollectorMetricSet, MemoryUsageGaugeSet, ThreadStatesGaugeSet etc.

Having them in Hawkular will help you to quickly correlate information, such as an increasing memory heap or non-heap usage related to the use of a cache in our example.

JVM metrics

Heap vs non-heap memory used, plus some counters on threads and GC. See the drop in heap memory, at about third quarter of the timeline? It matches the cache eviction in EhCache.

Note
An interesting fact is that the Cassandra database also exposes metrics through Dropwizard. And Hawkular uses Cassandra internally for metrics storage. Which means that it can be self-monitored with the Hawkular Dropwizard reporter. If you want to read more on this subject, check out Cassandra metrics and some instructions here.

Fine-tuning the reporter

Tagging

There are some improvements we can bring to our sample app. First of all, we could tag our metrics.

Tagging may not seem very important at first sight, but over time when you get more and more metrics, and when you try to exploit them in a dynamic way, tags become crucial.

Even for this sample app, when building the Grafana dashboard we soon want to make it generic so that it can show any other competing implementation of caches. In order to do it, we will create per-metric tags based on regexp. Just by adding a few lines in the HawkularReporter builder:

    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .addRegexTag(Pattern.compile(GuavaBackend.NAME + "\\..*"), "impl", GuavaBackend.NAME)
            .addRegexTag(Pattern.compile(EhcacheBackend.NAME + "\\..*"), "impl", EhcacheBackend.NAME)
            .addGlobalTag("hostname", hostname)
            .prefixedWith(hostname + ".")
            .build();

And as you can see I also added a global tag with the hostname.

With that configuration, every metrics whose name starts with "guava." will be tagged "impl:guava", and similarly for ehcache. Every metric reported through this reporter will be tagged with the hostname.

Filtering

If you use Grafana with this sample app, you’ve probably noticed how annoying it is to find and select the metric you want to display, because it’s flooded among tons of other metrics. And obviously, the more you store metrics, the more resources will be consumed. So you can filter out metrics you don’t want.

There’s two kinds of filters:

  • the usual built-in Dropwizard filters, that you can set using HawkularReporterBuilder.filter and by implementing MetricFilter

  • another kind of filter that is very specific to the Hawkular reporter, called MetricComposition and for which I must provide some details:

As stated before, Dropwizard has several metric types (gauges, meters, timers etc.), some of them being composed of multiple values. So they don’t match 1-1 with Hawkular metric types, which are made of simple values (basically, doubles for gauges and longs for counters — there are other types but unused in the dropwizard reporter).

In order not to loose any piece of data, Dropwizard metrics are exploded into several metrics in Hawkular. For instance, a Meter named guava.cache.read will be translated into 4 gauges (guava.cache.read.1minrt, guava.cache.read.5minrt, guava.cache.read.15minrt, guava.cache.read.meanrt) and 1 counter (guava.cache.read.count) in Hawkular. The full translation table is described here.

From the Dropwizard point of view, there is no metric called "guava.cache.read.1minrt". So you cannot filter it out with Dropwizard filters. However you can act on the "metric composition" in the Hawkular reporter. Either by providing the full metric name:

    // builder.
      .setMetricComposition("guava.cache.read", Lists.newArrayList("1minrt", "meanrt", "count"))

or using regexp, as I’m doing in the sample app:

    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .addRegexTag(Pattern.compile(GuavaBackend.NAME + "\\..*"), "impl", GuavaBackend.NAME)
            .addRegexTag(Pattern.compile(EhcacheBackend.NAME + "\\..*"), "impl", EhcacheBackend.NAME)
            .addGlobalTag("hostname", hostname)
            .prefixedWith(hostname + ".")
            .setRegexMetricComposition(Pattern.compile("net\\.sf\\.ehcache"), Lists.newArrayList("mean", "meanrt", "5minrt", "98perc", "count"))
            .setRegexMetricComposition(Pattern.compile(".*"), Lists.newArrayList("mean", "meanrt", "count"))
            .build();

Here, we configure all net.sf.ehcache.* metrics (EhCache middleware metrics) to provide their mean, meanrt, 5minrt, 98perc and count attributes. All other attributes will be discarded. For all other metrics we only keep mean, meanrt and count.

The declaration order matters, since only the first matching pattern will be used for a given metric name.

Tip
Using plain string rather than regexp for metric composition is more efficient, since they are internally indexed in a HashMap.

That was a quite complete tour of the Hawkular Dropwizard reporter. Some useful links:





Extending Complex Event Processing in Hawkular Alerting

13 January 2017, by Lucas Ponce

Stream Processing versus Polling Processing

Hawkular Alerting uses different techniques to detect behaviours by defining rules.

The main implementation is based in a Stream Processing design. Hawkular Alerting analyzes streams of data and events ordered in a timeline searching for conditions matches. This technique is also known as Event Processing (or Complex Event Processing) especially when the processing of incoming data might need multiple conditions.

Stream Processing

This method is particularity good to identify meaningful happenings on complex scenarios and respond to them as quickly as possible.

Hawkular Alerting also supports an Alerter based Polling technique that allows periodic processing of defined queries against a backend.

Polling Processing

This Polling technique is also good in non real time scenarios, for example, to compare historical values in large range time intervals using statistical operators like averages, medians or percentiles.

Sliding Windows on Complex Event Processing

An interesting aspect in the logic of detection behaviours is the combination of Stream and Polling processing characteristics.

A Sliding Window is a way to define a scope in the stream of data and events received in the timeline. This Sliding Window lets us define special rules that apply only on the scoped data and events.

Sliding Windows

A powerful use case is when we can aggregate the scoped events and define expressions on them.
For example, let’s examine the high level scenarios

  • Marketing
    Detect when a customer buys several items in a short period of time

  • Fraud
    Alert when a customer buys from several locations in a short period of time

  • Customer loyalty
    Detect specific transactions to offer premium discounts to customers

All of these scenarios need some sort of aggregation (i.e. aggregate events by customer/transaction id), define expressions on aggregated fields and scope these events using a Slide Window.

Events Aggregation Extension

Hawkular Alerting has introduced a new Extension called EventsAggregation.
Extensions is a new mechanism to pre-process data or events before they are processed by the core Alerting Engine.

The EventsAggregation Extension allows us to scope Sliding Windows on Events and define expressions on aggregated data. This new feature can be used on Triggers with ExternalCondition and EventsAggregation alerter.

For example, we can define a Trigger that represents the Marketing scenario previously described:

{
  "triggers":[
    {
      "trigger":{
        "id": "marketing-scenario",
        "name": "Marketing Scenario",
        "description": "Detect when a customer buys several items in a short period of time",
        "severity": "HIGH",
        "enabled": true,
        "actions":[
          {
            "actionPlugin": "email",
            "actionId": "notify-to-marketing"
          }
        ],
        "tags":{
            "HawkularExtension":"EventsAggregation"
        }
      },
      "conditions":[
        {
          "triggerMode": "FIRING",
          "type": "EXTERNAL",
          "alerterId":"EventsAggregation",
          "dataId": "marketing",
          "expression": "event:groupBy(context.accountId):window(time,10s):having(count > 2)"
        }
      ]
    }
  ],
  "actions":[
    {
      "actionPlugin": "email",
      "actionId": "notify-to-marketing",
      "properties": {
        "to": "marketing@hawkular.org"
      }
    }
  ]
}

The expression used can be described as follows:

groupBy(context.accountId)      Group window events by context "accountId" field
window(time,10s)                Define a sliding time window of 10 seconds
having(count > 2)               Define an expression on the grouped events

In other words, this condition will be true, each time that there are more than two events with the same accountId for a 10 seconds window.

In a similar way, we can describe the Fraud scenario previously described with the expression:

"event:groupBy(tags.accountId):window(time,10s):having(count > 1, count.tags.location > 1)"

Where

groupBy(context.accountId)                    Group window events by context "accountId" field
window(time,10s)                              Define a sliding time window of 10 seconds
having(count > 1, count.tags.location > 1)    Define an expression on the grouped events

This condition will be true when there are more than 1 events with more than one location tag, so detecting when events for the same accountId happens from different places.

The two previous expressions groups all events of the timing window. We might have scenarios where only specific events should be grouped.

For these cases we can add filters into the expressions like in the following example:

"event:groupBy(tags.traceId):filter((category == \"Credit Check\" && text == \"Exceptionally Good\") || (category == \"Stock Check\" && text == \"Out of Stock\")):having(count > 1, count.tags.accountId == 1)"

This expression will group events filtered by a expression

filter(
    (category == \"Credit Check\" && text == \"Exceptionally Good\") ||
    (category == \"Stock Check\" && text == \"Out of Stock\")
)

Note that this expression doesn’t define an explicit sliding time window, so it will use a default expiration window.

Use cases

Stream Processing and Polling Processing might be used for similar scenarios.

The EventsAggregation Extension groups Events in memory so it is designed for real time scenarios with relatively short sliding windows.

Conclusion

EventsAggregation Extension is a useful addition to Hawkular Alerting that will extend the scenarios and type of behaviours that can be detected.

In our future work we will enhance this extension covering more use cases (potential aggregation of data).

We hope this short introduction helps to show how the EventsAggregation Extension provides powerful new CEP capabilities for Hawkular Alerting.

Comments and questions are welcome, here or in #hawkular room on freenode.





Hawkular Metrics 0.23.0 - Release

04 January 2017, by Stefan Negrea

I am happy to announce release 0.23.0 of Hawkular Metrics. This release is anchored by performance and stability improvements.

Here is a list of major changes:

Hawkular Alerting - Included

Hawkular Metrics Clients

Release Links

A big "Thank you" goes to John Sanda, Matt Wringe, Michael Burman, Joel Takvorian, Jay Shaughnessy, Lucas Ponce, and Heiko Rupp for their project contributions.





Hawkinit

22 December 2016, by Jirka Kremser

Introducing Hawkinit

This simple CLI tool written in NodeJS will help you to set up the running instance of Hawkular Services, multiple instances of Cassandra, and also the instances of WildFly servers that have the Hawkular agent installed and configured to report the metrics to the Hawkular Services.

User can select if he wants the WildFly server running in the standalone mode or as a managed domain. She can also choose how many instances it should spawn. If the domain mode is selected, there are some prepared scenarios with different profiles and different amount of servers and server groups per host controller. Instead of standalone mode, one can also spawn domain with multiple host controllers simulating the complex real world scenario in couple of seconds.

Under the hood, the application dynamically creates a docker-compose.yml file in the temp directory with the parameters obtained as the answers and run the services as a linked containers that communicate among themselves. No rocket science, but it can be handy when trying to set up something quickly or when trying the Hawkular ecosystem.

Usage

The usage is really simple, assuming the npm, docker and docker-compose are installed and the current user belongs to the docker group; all is needed is running the hawkinit and interactively answer all the questions.

$ npm install hawkinit -g
$ hawkinit

Example

Here is an example of spinning up two standalone WF servers reporting to Hawkular Services with one Cassandra node.

usage demo

Contributions

The github repository is here and contributions are more than welcome. In case of any issue, do not hesitate and report it here.





Older posts:

RSS Feed

redhatlogo-white

© 2016 | Hawkular is released under Apache License v2.0