Reporting Dropwizard metrics to Hawkular

A blog post by Joel Takvorian

Dropwizard metrics (aka. Codahale aka. Yammer) is a well known, successful metrics framework that is easy to plug in a Java application. You just create a metrics registry, create metrics, feed them and voilà. But Dropwizard metrics doesn’t store anything by itself, it’s delegated to the so-called "reporters". So we’ve built the hawkular-dropwizard-reporter.

This article will show you:

how you can create custom metrics
how you can take advantage of existing middleware that uses Dropwizard metrics
how to fine-tune the Hawkular reporter to make data exploitation easy

As an example, I will use a very simple application to benchmark some data structures (caches) in real time, and output results as metrics. This is not a serious benchmark and we don’t really care about the results: our purpose is nothing more than to illustrate how we can work with Hawkular and Dropwizard.

Minimal setup

You need very few things to start using the Hawkular Dropwizard reporter. Obviously, you must have a running instance of Hawkular Metrics. If you don’t have it yet, check out the installation guide.

Then, add a maven dependency to your Java application:

    <dependency>
        <groupId>org.hawkular.metrics</groupId>
        <artifactId>hawkular-dropwizard-reporter</artifactId>
        <version>0.1.1</version>
    </dependency>

In your Java code, create a MetricRegistry and pass it to the HawkularReporter:

    MetricRegistry registry = new MetricRegistry();
    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .build();
    hawkularReporter.start(1, TimeUnit.SECONDS);

And that’s it. From now on, all metrics that are added to the registry will be reported to Hawkular Metrics. The default configuration expects to find the Hawkular server on localhost:8080, you can change it with the builder’s uri method. You can explore the builder options to configure it.

Tip	In this example, "sample-tenant" refers to the tenant used in Hawkular. You can set whatever you want instead. Check Hawkular documentation to learn more about multi-tenancy.

If, for instance, we want to connect to an Hawkular pod running in OpenShift, we would configure the reporter similar to that one:

    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .uri("https://metrics.192.168.42.63.xip.io/hawkular/metrics")
            .bearerToken("ABCDEFGHIJ1234")
            .build();

To conclude with the initial setup, note that it is often a good practice to include some host-based disambiguation string in the metric names, so that if your application is deployed on several hosts and they all communicate to the same Hawkular instance using the same tenant, they will not conflict and feed the same metrics. A convenient way to do that is to set a prefix in the reporter configuration, for instance using the host name:

    String hostname;
    try {
        hostname = InetAddress.getLocalHost().getCanonicalHostName();
    } catch (UnknownHostException e) {
        hostname = "?";
    }
    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .prefixedWith(hostname + ".")
            .build();

Dropwizard → Hawkular

As stated before, our sample app will do some (very) basic benchmarking on several caches implementations: we’ll have a Guava cache and a local EhCache.

Note	This sample app is on my GitHub: https://github.com/jotak/hawkular-dropwizard-sample

We create a very simple interface that allows us to get data from cache, count number of elements and know if the latest read was from cache or not:

interface Backend {
    Object get(String key);
    long count();
    void init(Map<String, Object> presetElements);
    boolean isLastReadFromCache();
}

You can see on GitHub the GuavaBackend and EhcacheBackend implementations, there’s nothing fancy here.

EhCache is initialized programmatically:

    CacheManager cacheManager = CacheManager.newInstance();
    Ehcache cache = cacheManager.addCacheIfAbsent("testCache");

Now, the Benchmark class. This is where we create metrics and feed them, with the pretext of a small scenario. Here’s the run method:

    private void run() {
        final DatabaseStub fakeDb = new DatabaseStub(); (1)
        final BackendMonitoring monitoring = new BackendMonitoring(registry); (2)
        Map<String, Object> presetElements = IntStream.range(0, 100000)
                .mapToObj(Integer::new)
                .collect(Collectors.toMap(i -> UUID.randomUUID().toString(), i -> i)); (3)

        (4)
        monitoring.runScenario(presetElements, new GuavaBackend(fakeDb), GuavaBackend.NAME);
        monitoring.runScenario(presetElements, new EhcacheBackend(fakeDb, ehcache), EhcacheBackend.NAME);
    }

The class DatabaseStub is used to simulate a database with latency, using just a HashMap for storage (view it on GitHub).
BackendMonitoring will setup monitoring for a given backend.
Creates a collection of items to store in cache, and use them in the scenarios.
Run the scenario for each Backend implementation (described below).

Now, BackendMonitoring.runScenario:

    void runScenario(Map<String, Object> presetElements, Backend backend, String name) {
        System.out.println("Starting scenario for " + name);
        registry.register(name + ".size", (Gauge<Long>) backend::count); (1)
        Timer readTimer = registry.timer(name + ".read"); (2)
        final Meter readCacheMeter = registry.meter(name + ".cache.read"); (3)
        final Meter readDbMeter = registry.meter(name + ".db.read"); (4)
        final Counter numberItemsRead = registry.counter(name + ".total.read.count"); (5)
        // Setup preset elements
        backend.init(presetElements);
        List<String> keys = new ArrayList<>(presetElements.keySet());
        ThreadLocalRandom rnd = ThreadLocalRandom.current();
        Stopwatch watch = Stopwatch.createStarted();
        while (watch.elapsed(TimeUnit.MINUTES) < 5) {
            int pos = rnd.nextInt(0, keys.size());
            runWithBenchmark(() -> {
                backend.get(keys.get(pos));
                if (backend.isLastReadFromCache()) {
                    readCacheMeter.mark();
                } else {
                    readDbMeter.mark();
                }
                numberItemsRead.inc();
            }, readTimer);
        }
        // Reset size gauge to 0
        backend.init(new HashMap<>());
        System.out.println("Ending scenario for " + name);
    }

And finally, BackendMonitoring.runWithBenchmark:

    private void runWithBenchmark(Runnable r, Timer readTimer) {
        final Timer.Context ctx = readTimer.time();
        try {
            r.run();
        } finally {
            ctx.stop();
        }
    }

Here we create several metrics:

A Gauge that will track the number of elements in cache.
A Timer metric. Each time the runWithBenchmark method is called, that timer computes the Runnable execution time.
A Meter that is invoked each time data is read from cache (rather than DB).
The opposite: a Meter that is invoked each time data is read from db.
A Counter that tracks the total number of reads. We could actually get rid of it, because its value could be retrieved from readDbMeter.count + readCacheMeter.count (yes, a Meter includes a Counter).

You can learn more about Dropwizard metric types from its documentation.

Remember that since we associated the Hawkular reporter with the metrics registry, all metrics are automatically reported into Hawkular Metrics.

Now, let’s run the benchmark. I’m using Grafana with its Hawkular plugin to display graphs.

Upper-left: storage size (yellow = Guava, green = EhCache)
Upper-right: read response time (yellow = Guava, green = EhCache)
Bottom-left: read cache vs DB - mean rate (orange = Guava/db, yellow = Guava/cache, blue = EhCache/db, green = EhCache/cache)
Bottom-right: read cache vs DB - count (orange = Guava/db, yellow = Guava/cache, blue = EhCache/db, green = EhCache/cache)

We can see the Guava cache scenario in the first 5 minutes, followed by the EhCache scenario. Note how the storage size fells abruptly at about halfway of EhCache scenario: this is probably due to a cache eviction mechanism that is present by default (given we didn’t configure the cache at all).

We can correlate that with the response time with EhCache that is not improving as fast as Guava’s as long as the cache get filled. However we can suppose it’s compensated for smaller memory footprint.

Middleware → Dropwizard → Hawkular

So, we know how to create metrics. That’s perfect to track values that are very specific to an application. But the best is that a lot of existing Java middleware already provides tons of metrics on Dropwizard, that you can integrate very easily in your application.

There is a non exhaustive list in Dropwizard documentation (here and there). It includes EhCache, Apache Http client, Jetty, etc. But they are actually many others. Some frameworks, like Vert.X may also report metrics directly to Hawkular, so you don’t even need to go through Dropwizard at all (but still, you can).

Since we’re already using EhCache in our sample app, let’s try to get EhCache middleware metrics. We need first to add a maven dependency:

    <dependency>
      <groupId>io.dropwizard.metrics</groupId>
      <artifactId>metrics-ehcache</artifactId>
      <version>3.1.2</version>
    </dependency>

When we initialize EhCache programmatically, we create an InstrumentedEhcache object, which is its Dropwizard avatar:

    private Benchmark(MetricRegistry registry, Ehcache cache) {
        this.registry = registry;
        ehcache = InstrumentedEhcache.instrument(registry, cache);
    }

And then we use this InstrumentedCache instead of the initial EhCache object in the rest of our code. That’s it. Every time something is done on EhCache, metrics will be feeded.

See for instance what we get in Grafana, when the EhcacheBackend is invoked during our scenario:

Here we track some metrics such as the gets and puts mean, the number of memory hits and misses. See the full list of available metrics.

What else could we do… We’re on the JVM, right? We could get monitoring data from MX Beans (such as MemoryMXBean) and create our own metrics in Dropwizard, but there’s already a module that does the job:

    <dependency>
      <groupId>io.dropwizard.metrics</groupId>
      <artifactId>metrics-jvm</artifactId>
      <version>3.1.2</version>
    </dependency>

After creating the MetricRegistry, you can add some preset JVM metric sets, such as GarbageCollectorMetricSet, MemoryUsageGaugeSet, ThreadStatesGaugeSet etc.

Having them in Hawkular will help you to quickly correlate information, such as an increasing memory heap or non-heap usage related to the use of a cache in our example.

Heap vs non-heap memory used, plus some counters on threads and GC. See the drop in heap memory, at about third quarter of the timeline? It matches the cache eviction in EhCache.

Note

An interesting fact is that the Cassandra database also exposes metrics through Dropwizard. And Hawkular uses Cassandra internally for metrics storage. Which means that it can be self-monitored with the Hawkular Dropwizard reporter. If you want to read more on this subject, check out Cassandra metrics and some instructions here.

Fine-tuning the reporter

Tagging

There are some improvements we can bring to our sample app. First of all, we could tag our metrics.

Tagging may not seem very important at first sight, but over time when you get more and more metrics, and when you try to exploit them in a dynamic way, tags become crucial.

Even for this sample app, when building the Grafana dashboard we soon want to make it generic so that it can show any other competing implementation of caches. In order to do it, we will create per-metric tags based on regexp. Just by adding a few lines in the HawkularReporter builder:

    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .addRegexTag(Pattern.compile(GuavaBackend.NAME + "\\..*"), "impl", GuavaBackend.NAME)
            .addRegexTag(Pattern.compile(EhcacheBackend.NAME + "\\..*"), "impl", EhcacheBackend.NAME)
            .addGlobalTag("hostname", hostname)
            .prefixedWith(hostname + ".")
            .build();

And as you can see I also added a global tag with the hostname.

With that configuration, every metrics whose name starts with "guava." will be tagged "impl:guava", and similarly for ehcache. Every metric reported through this reporter will be tagged with the hostname.

Filtering

If you use Grafana with this sample app, you’ve probably noticed how annoying it is to find and select the metric you want to display, because it’s flooded among tons of other metrics. And obviously, the more you store metrics, the more resources will be consumed. So you can filter out metrics you don’t want.

There’s two kinds of filters:

the usual built-in Dropwizard filters, that you can set using HawkularReporterBuilder.filter and by implementing MetricFilter
another kind of filter that is very specific to the Hawkular reporter, called MetricComposition and for which I must provide some details:

As stated before, Dropwizard has several metric types (gauges, meters, timers etc.), some of them being composed of multiple values. So they don’t match 1-1 with Hawkular metric types, which are made of simple values (basically, doubles for gauges and longs for counters — there are other types but unused in the dropwizard reporter).

In order not to loose any piece of data, Dropwizard metrics are exploded into several metrics in Hawkular. For instance, a Meter named guava.cache.read will be translated into 4 gauges (guava.cache.read.1minrt, guava.cache.read.5minrt, guava.cache.read.15minrt, guava.cache.read.meanrt) and 1 counter (guava.cache.read.count) in Hawkular. The full translation table is described here.

From the Dropwizard point of view, there is no metric called "guava.cache.read.1minrt". So you cannot filter it out with Dropwizard filters. However you can act on the "metric composition" in the Hawkular reporter. Either by providing the full metric name:

    // builder.
      .setMetricComposition("guava.cache.read", Lists.newArrayList("1minrt", "meanrt", "count"))

or using regexp, as I’m doing in the sample app:

    HawkularReporter hawkularReporter = HawkularReporter.builder(registry, "sample-tenant")
            .addRegexTag(Pattern.compile(GuavaBackend.NAME + "\\..*"), "impl", GuavaBackend.NAME)
            .addRegexTag(Pattern.compile(EhcacheBackend.NAME + "\\..*"), "impl", EhcacheBackend.NAME)
            .addGlobalTag("hostname", hostname)
            .prefixedWith(hostname + ".")
            .setRegexMetricComposition(Pattern.compile("net\\.sf\\.ehcache"), Lists.newArrayList("mean", "meanrt", "5minrt", "98perc", "count"))
            .setRegexMetricComposition(Pattern.compile(".*"), Lists.newArrayList("mean", "meanrt", "count"))
            .build();

Here, we configure all net.sf.ehcache.* metrics (EhCache middleware metrics) to provide their mean, meanrt, 5minrt, 98perc and count attributes. All other attributes will be discarded. For all other metrics we only keep mean, meanrt and count.

The declaration order matters, since only the first matching pattern will be used for a given metric name.

Tip	Using plain string rather than regexp for metric composition is more efficient, since they are internally indexed in a `HashMap`.

That was a quite complete tour of the Hawkular Dropwizard reporter. Some useful links:

The sample app used to illustrate this article: https://github.com/jotak/hawkular-dropwizard-sample
The Grafana dashboard I used (exported json): https://raw.githubusercontent.com/jotak/hawkular-dropwizard-sample/master/grafana/grafana-dropwizard-sample.json
The GitHub page of the reporter itself, along with its documentation, is here: https://github.com/hawkular/hawkular-dropwizard-reporter

Published by Joel Takvorian on 16 January 2017