Hawkular Alerting for Developers

Hawkular Alerting Developer Guide - Version 2

alerts

Introduction

Using Hawkular Alerting 1.x? See the Version 1 Developer Guide here.

Hawkular Alerting is a component of the Hawkular management and monitoring project. It’s goal is to provide flexible and scalable alerting services in an easily consumable way.

The Hawkular Alerting project lives on GitHub.

Alerting Philosophy

Alerting is useful, necessary, and typically an integral part of operational sanity. Done well it strikes the perfect balance between human intervention and automation. Done poorly it is an ineffective nuisance. Hawkular Alerting tries to provide the tools to do things well, but it can just as easily be abused. Alerts bring attention to a problem, or developing problem. That problem typically requires human intervention to resolve. As best as possible the alert should represent high level symptoms that affect a user experience. The number of generated alerts should be small because a human can only respond to a few situations daily. The same alert should likely not be repeated, or repeated only if a response has not been initiated.

Alerts

Alerts are generated when an Alert Trigger fires, based on a set of defined conditions that have been matched, possibly more than once or have held true over a period of time. When fired the trigger can perform actions, typically but not limited to notifications (e-mail, sms, etc). Alerts then start moving through the Open, Acknowledged, Resolved life-cycle. There are many options on triggers to help ensure that alerts are not generated too frequently, including ways of automatically disabling and enabling the trigger.

Events

As discussed above the number of alerts should be small in order to be manageable. But it can be useful to capture interesting happenings in the monitored world. This is called an Event in Hawkular Alerting. An event can be roughly thought of as an alert without lifecycle. Like alerts, an event can be generated by a trigger but unlike an alert, it can also be injected directly via the API, so it is very easy for clients to insert events as desired. And although trigger-generated events can define actions to be performed, in general an event does not need human intervention. Instead, it is typically something that can contribute to an alert firing, help investigate an alert, or simply help understand system behavior.

It is expected that the number of Events can be very large compared to the number of Alerts. Events (like Alerts), can be flexibly queried.

Triggers

A Trigger defines the conditions that when satisfied will cause the trigger to fire an Alert. Triggers can have one or more conditions and can optionally fire when ANY or ALL of the conditions are met. A trigger can generate an Alert or an Event.

Conditions

There are several different kinds of conditions but they all have one thing in common, each requires some piece of data against which the condition is evaluated. Here are the different kinds of conditions:

  • Availability

    • X is DOWN

  • Compare

    • X < 80% Y

  • Event

    • event.id starts 'IDXYZ', event.tag.category == 'Server', event.tag.from ends '.com'

  • Missing

    • No event/data received for X in the last 5 minutes

  • Nelson

  • Rate

    • X > 10 per-minute

  • String

    • X starts with "ABC", X matches "A.*B"

  • Threshold

    • X < 10, X >= 20

  • ThresholdRange

    • X inside [10,20), X outside [100,200]

Most conditions deal with numeric data. But String and Availability data is also supported. A trigger can combine conditions dealing with data of different types and from different sources.

Actions

The whole purpose of alerting is to be able to immediately respond to a developing or active problem. Hawkular Alerting provides several plugins to take action when alerts are generated. Custom action plugins can be defined as well. The list of provided action plugins keeps growing, Here is a sample:

  • E-mail notification

  • SMS notification

  • SNMP notification

  • Pager Duty integration

  • Aerogear integration

  • File-system notification

  • Webhook notification

  • Elasticsearch integration

Trigger Dampening

It’s often the case that you don’t want a trigger to fire every time a condition set is met. Instead, you want to ensure that the issue is not a spike of activity, or that you don’t flood an on-call engineer with alerts. Hawkular Alerting provides several way of ensuring triggers fire only as desired. We call this "Trigger Dampening". An example is useful for understanding dampening.

Let’s say we have a trigger with a single condition: responseTime > 1s.

It is important to understand how the reporting interval plays into alerting, and into dampening. Assume responseTime is reported every 15s. That means we get roughly 4 data points every minute, and therefore evaluate the condition around 4 times a minute.

Here are the different trigger dampening types:

Strict

  • N consecutive true evaluations

  • Useful for ignoring spikes in activity or waiting for a prolonged event

In our example this could be, "Fire the trigger only if responseTime > 1s for 6 consecutive evaluations". So, given a 15s reporting interval this means response time would likely have been high for about 90s. But note that if the reporting interval changes the firing time will change. This is used more when the number of evaluations is more important than the time it takes to fire.

Note that default dampening for triggers is Strict(1). Which just means that by default a trigger fires every time it’s condition set evaluates to true.

Relaxed Count

  • N true evaluations out of M total evaluations

  • Useful for ignoring short spikes in activity but catching frequently spiking activity

In our example this could be, "Fire the trigger only if responseTime > 1s for 4 of 8 evaluations". This means the trigger will fire if roughly half the time we are exceeding a 1s response time. Given a 15s reporting interval this means the trigger could fire in 1 to 2 minutes of accumulated evaluations. But note that if the reporting interval changes the firing time will change. This is used more when the number of evaluations is more important than the time it takes to fire.

Relaxed Time

  • N true evaluations in T time

  • Useful for ignoring short spikes in activity but catching frequently spiking activity

In our example this could be, "Fire the trigger only if responseTime > 1s 4 times in 5 minutes". This means the trigger will fire if we exceed 1s response time multiple times in a 5 minute period. Given a 15s reporting interval this means the trigger could fire in 1 to 5 minutes of accumulated evaluations. But note that if the reporting interval changes the firing time will change. And also note that the trigger will never fire if we don’t receive at least 4 reports in the specified 5 minute period. This is used when you don’t want to exceed a certain period of time before firing.

Strict Time

  • Only true evaluations for at least T time

  • Useful for reporting a continued aberration

In our example this could be, "Fire the trigger only if responseTime > 1s for at least 5 minutes". This means the trigger will fire if we exceed 1s response time on every report for a 5 minute period. Given a 15s reporting interval this means the trigger will fire after roughly 20 consecutive true evaluations. Note that if the reporting interval changes the firing time will remain roughly the same. It is important to understand that at least 2 evaluations are required. The first true evaluation starts the clock. Any false evaluation stops the clock. Assuming only true evaluations, the trigger fires on the first true evaluation at or after the specified period. The shorter the reporting interval the closer the firing time will be to the specified period, T.

Strict Timeout

  • Only true evaluations for T time

  • Useful for reporting a continued aberration with a more guaranteed firing time

In our example this could be, "Fire the trigger only if responseTime > 1s for 5 minutes". This means the trigger will fire if we exceed 1s response time on every report for a 5 minute period. Given a 15s reporting interval this means the trigger will fire after roughly 20 consecutive true evaluations. Note that if the reporting interval changes the firing time will remain the same. It is important to understand that only 1 evaluation is required. The first true evaluation starts the clock. Assuming only true evaluations, the trigger fires at T, when a timer expires and fires the trigger. Any false evaluation stops the clock and cancels the timer. This type of dampening has more processing overhead because the trigger evaluation requires an external timer.

AutoDisable

A trigger can be set for AutoDisable. Whereas dampening can limit the firing rate of a trigger, disabling a trigger completely stops the trigger from firing (or being evaluated). A trigger can be manually enabled and disabled, via the REST API, but it can also be disabled automatically. If the trigger has the autoDisable option set to true then after it fires it id disabled, preventing any subsequent alerts until manually re-enabled. The default is false.

AutoEnable

A trigger can be set for AutoEnable. If AutoEnable is true then when an alert is resolved, and if all alerts for the trigger are then resolved, the trigger will be enabled if it is currently disabled. This ensures that the trigger will again go into firing mode, without needing to be manually enabled by the user. The default is false.

Source

By default both Triggers and Data ignore "source". This means that the dataIds defined on a trigger’s conditions are matched against the dataIds on incoming data (within a tenant) and matching data is evaluated against the conditions. It is possible to qualify triggers and data with a "source" such that a trigger only evaluates data having the same source.

This mechanism is used automatically by Data-Driven Group Triggers but can be used manually as well. If you find that data is better described using a combination source+id, as opposed to just id, then this approach may be appropriate.

Group Triggers

It’s often the case that the same alerting needs to be applied to all instances of the same thing. For example, it may be useful to alert on "System Load > 80%" on 50 different CPUs. It can be cumbersome to manage 50 individual triggers.

A Group Trigger allows you to define a single trigger and then apply it to a group of logically similar things. A group trigger could be used in the example above. Then, a member could be added for each CPU. The member triggers are basically managed copies of the group trigger. Changes at the group level are pushed down to the members. So, to change "80%" to "85%", or to change autoDisable from false to true, only the group trigger must be changed.

Managing DataIds

The group trigger is basically a template, it is not deployed. Only the member triggers are deployed and actively evaluated because only the member triggers are associated with real dataIds on the conditions. The group trigger uses "tokens" for the dataIds and each member, when defined, must provide a map of dataId token replacements.

Using the example above, our group trigger would define a condition using a dataId token, like:

{ type: "threshold",
  dataId: "SystemLoad",
  operator: "GT"
  threshold: "80.0"
}

When adding a member for a specific CPU, say CPU-1, we’d map the token to the real dataId, something like:

dataIdMap: {
  "SystemLoad":"CPU-1_SystemLoad"
}

Where "CPU-1_SystemLoad" reflects the actual id associated with system load data sent to alerts for CPU-1.

When updating conditions at the group level it is necessary to supply dataId mappings for all of the existing members because the dataIds may have changed on the new condition set.

Orphans

There are times when a particular group member may need to managed individually. For example, if a single CPU is of particular concern it may be useful to change the threshold level on just that member. It is possible to orphan a member trigger and manage it independently, while maintaining it’s association with the group trigger. It can be unorphaned at any time, and reset to the group settings.

Data-Driven Group Triggers

Group triggers allow a common definition to be applied to logically similar members. For example, a group trigger could be defined for alerting on CPU SystemLoad and a member trigger would be added for every CPU, each a copy of the group trigger but working against the proper dataId(s) given the CPU instance. When a member is added a map from the group’s [token] dataIds to the members [real] dataIds must be provided. And if updating conditions at the group level a map for each existing member must be provided. This makes sense, and is fine, but it can be tedious, or difficult to supply.

It’s not uncommon for the member-level dataIds to be a concatenation of id of the source member (e.g. a resourceId, CPU-1, etc) and the group level dataId token (SystemLoad). So you end up with member-level ids like 'CPU-1_SystemLoad' where the "source" is 'CPU-1' and the dataId is 'SystemLoad'.

Data-Driven Group Triggers are able to add member triggers to a group automatically, one for each "source" of the same data. In other words, for a group trigger on CPU SystemLoad, add a member automatically for each source CPU reporting the 'SystemLoad' metric. By reporting data as a combination of source and dataId this should be possible. So, instead of reporting:

Data(id:cpu-1-Load, value:123)

We’d want:

Data(source:cpu-1, id:Load, value:123)

This would then relieve the client from having to add member triggers up front and instead assume that the group will grow as needed, based on the incoming data.

Because dataIds are often defined upstream it is not always possible to supply Hawkular Alerting with data such that the source and id are separated. But if possible this is a power ful approach.

Behavioral Notes

A couple of notes about data-driven group triggers:

  • Each member trigger is associated with a single source and only considers data from that source.

    • True for single and mult-condition triggers.

  • Condition changes in the group trigger will remove all member triggers.

    • The members will then again be created as the data demands.

  • The Source mechanism can also be used with manually managed triggers, if desired.

Alert Lifecycle

Hawkular Alerting can integrate with other systems to handle Alert Lifecycle, but alerts can also be managed directly within the tool. Hawkular Alerting supports a typical move through a simple lifecycle. An alert starts in OPEN status, optionally moves to ACKNOWLEDGED to indicate the alert has been seen and the issue is being resolved, and is finally set to RESOLVED to indicate the problem has been fixed.

AutoResolve

Triggers require firing conditions and always start in Firing mode. But the trigger can optionally supply autoResolve conditions. If autoResolve=true then after a trigger fires it switches to AutoResolve mode. In AutoResolve mode the trigger no longer looks for problem conditions, but instead looks for evidence that the problem is resolved. A simple example would be a trigger that has a firing condition of Availability DOWN, and an autoResolve condition of Availability UP. This mechanism ensures that only one alert is generated for a problem, and that when the problem has been resolved, the trigger automatically returns to firing mode.

Moreover, if autoResolveAlerts=true then when the AutoResolve conditions are satisfied all of its unresolved alerts will be automatically set RESOLVED.

Like Firing mode, AutoResolveMode can optionally define its own dampening setting.

Tags

Tags can have a variety of uses but are commonly used to assist in search. Tags are free-formed name-value pairs and can be applied to: * Triggers * Alerts * Events

Tags on triggers are automatically passed on to the Alerts or Events generated by that trigger. This allows the same search criteria used to fetch triggers to also be used to fetch the alerts or events generated by those triggers.

A tag’s name and value must both be non-null.

Tags Query Language

<tag_query> ::= ( <expression> | "(" <object> ")"
| <object> <logical_operator> <object> )
<expression> ::= ( <tag_name> | <not> <tag_name>
| <tag_name> <boolean_operator> <tag_value>
| <tag_name> <array_operator> <array> )
<not> ::= [ "NOT" | "not" ]
<logical_operator> ::= [ "AND" | "OR" | "and" | "or" ]
<boolean_operator> ::= [ "=" | "!=" ]
<array_operator> ::= [ "IN" | "NOT IN" | "in" | "not in" ]
<array> ::= ( "[" "]" | "[" ( "," <tag_value> )* )
<tag_name> ::= <identifier>
<tag_value> ::= ( "'" <regexp> "'" | <simple_value> )
;
; <identifier> and <simple_value> follow pattern [a-zA-Z_0-9][\\-a-zA-Z_0-9]*
; <regexp> follows any valid Java Regular Expression format

External Alert Integration

There are times when an external system will already be looking for and detecting potential issues in its environment. It is possible for these detection-only systems to leverage the power of Hawkular Alerting' trigger and action infrastructure. For example, let’s say there is already a sensor in place looking for overheating situations. When it detects something overheating it can take some action. In this case we are not sending a stream of heat readings to alerting and having it evaluate against a threshold set on a trigger condition. Instead, the threshold and evaluation are all built into the sensor. To integrate with Hawkular Alerting we can use an "External Condition".

External Conditions

External integration begins with standard triggers. In this way we immediately get everything that triggers offer: actions, dampening, lifecycle, auto-resolve, etc. The difference is that instead of the typical condition types: Threshold, Availability, etc.., we can use an ExternalCondition. An external condition is like other conditions in that it has a 'dataId' with which it matches data sent into Hawkular Alerting. It also has 'systemId' and 'expression' fields. The systemId is used to identify the external system for which the condition is relevant. In our example, perhaps "HeatSensors". The expression field is used as needed. In our example it may not be needed or it could be a description like, "sensor detected high temperature". In other examples it could be used to store a complex expression that will be evaluated by the external system.

The main thing about external conditions is that they always evaluate to true. It is assumed that when a datum comes in with a dataId assigned to an external condition that that condition immediately evaluates to true. A trigger with a single external condition (and default dampening) would fire on every datum sent in for it’s condition. This is because it is assumed the external system already did the work of determining there was an issue.

Note that the string data sent in has any value the external alerter system wants it to be. In our example it may be a sensorId and temperature, like "Sensor 5368, temperature 212F".

Action Plugins

Plugins are responsible to execute actions when an alert, or possibly an event, happens.

Actions can be a notification task or a complex process.

Hawkular Alerting provide a plugin architecture to extend and add new behaviours.

Create a new plugin

We can add a new plugin in hawkular in several steps:

  • Create a new project under hawkular-alerts-actions-plugins.

You can use an existing one as a template i.e. hawkular-alerts-actions-generic
  • Add an implementation of org.hawkular.alerts.actions.api.ActionPluginListener interface.

  • Add a plugin name to the implementation with the org.hawkular.alerts.actions.api.ActionPlugin annotation.

For example:

@ActionPlugin(name = "file")
public class FilePlugin implements ActionPluginListener {
    ...
}

ActionPluginListener interface

This interface has the responsability of

  • Define which properties and default values are supported by a plugin

...
    /**
     * The alerts engine registers the plugins available with their properties.
     * This method is invoked at plugin registration time.
     *
     * @return a list of properties available on this plugin
     */
    Set<String> getProperties();

    /**
     * The alerts engine registers the plugins available with their default values.
     * This method is invoked at plugin registration time.
     * Default values can be modified by the alerts engine.
     *
     *
     * @return a list of default values for properties available on this plugin
     */
    Map<String, String> getDefaultProperties();
...
  • Process an incoming action message wrapped as a org.hawkular.alerts.actions.api.ActionMessage

...
    /**
     * This method is invoked by the ActionService to process a new action generated by the engine.
     *
     * @param msg message received to be processed by the plugin
     * @throws Exception any problem
     */
    void process(ActionMessage msg) throws Exception;
...

ActionMessage interface

This interface is a wrapper of the action sent by the engine with the effective properties to use by the plugin to process it.

package org.hawkular.alerts.actions.api;

import java.util.Map;

import org.hawkular.alerts.api.model.action.Action;

import com.fasterxml.jackson.annotation.JsonInclude;

/**
 * A message sent to the plugin from the alerts engine
 * It has the event payload as well as action properties
 *
 * @author Lucas Ponce
 */
public interface ActionMessage {

    @JsonInclude
    Action getAction();
}

The class org.hawkular.alerts.api.model.action.Action is generated for the engine and it has the event detail as part of its payload.

/**
 * A base class for action representation from the perspective of the alerts engine.
 * An action is the abstract concept of a consequence of an event.
 * A Trigger definition can be linked with a list of actions.
 *
 * Alert engine only needs to know an action id and message/payload.
 * Action payload can optionally have an event as payload.
 *
 * Action plugins will be responsible to process the action according its own plugin configuration.
 *
 * @author Jay Shaughnessy
 * @author Lucas Ponce
 */
public class Action {

    @JsonInclude
    private String tenantId;

    @JsonInclude
    private String actionPlugin;

    @JsonInclude
    private String actionId;

    @JsonInclude(Include.NON_NULL)
    private String eventId;
...
}

Email Action Plugin

Plugin Name

email

Property Description Default value

mail

"mail" property is used as main prefix for javax.mail.Session properties.

All "mail.<protocol>.<value>" properties are passed to mail Session.
Properties can be defined per action based.
If not properties defined at action level, it takes default plugin properties.

For these special "mail" properties, if not properties defined at action plugin, it will search at System.getProperties() level.

-

from

"from" property defines the sender of the plugin email.

Additional "from" properties can be defined to discriminate by alert state:
- "from.open": sender when alert is in open state
- "from.acknowledged": sender when alert is in acknowledge state
- "from.resolved": sender when alert is in acknowledge state

Discriminated properties have priority.

-

from-name

"from-name" property defines the name of the sender of the plugin email.

Additional "from-name" properties can be defined to discriminate by alert state:
- "from-name.open": name of the sender when alert is in open state
- "from-name.acknowledged": name of the sender when alert is in acknowledge state
- "from-name.resolved": name of the sender when alert is in acknowledge state

Discriminated properties have priority.

-

to

"to" property defines the recipient of the plugin email.

Additional "to" properties can be defined to discriminate by alert state:
- "to.open": recipient when alert is in open state
- "to.acknowledged": recipient when alert is in acknowledge state
- "to.resolved": recipient when alert is in acknowledge state

Discriminated properties have priority.

-

cc

"cc" property defines the extra recipients of the plugin email.

Additional "cc" properties can be defined to discriminate by alert state:
- "cc.open": extra recipients when alert is in open state
- "cc.acknowledged": extra recipients when alert is in acknowledge state
- "cc.resolved": extra recipients when alert is in acknowledge state

Discriminated properties have priority.

-

template.hawkular.url

"template.hawkular.url" property defines the URL that will be used in the template email to point to hawkular server. If not "template.hawkular.url" defined, then the plugin looks into system env HAWKULAR_BASE_URL.

-

template.locale

Email plugin supports localization templates.

"template.locale" is the property used to define which template to use for specific locale.

i.e. A plugin may have defined several templates to support multiple locales [es,en,fr], but we can define a specific locale per action [es].

-

template.plain

"template.plain" property defines the template used for plain text email.

Additional "template.plain" properties can be defined to support localization:
- "template.plain.LOCALE": where LOCALE is a variable that can point to specific localization.

Templates are plain text based on http://freemarker.org/ engine.
Email plugin processes the alert payload and adds a set of pre-defined variables to be used into the template.
The list of variables available for templates are wrapped into PluginMessageDescription class.

-

template.html

"template.html" property defines the template used for html email.

Additional "template.html" properties can be defined to support localization:
- "template.html.LOCALE": where LOCALE is a variable that can point to specific localization.

Email plugin uses templates based on http://freemarker.org/ engine.
Email plugin processes the alert payload and adds a set of pre-defined variables to be used into the template.
The list of variables available for templates are wrapped into PluginMessageDescription class.

-

More details for Email plugin and templates under links EmailPlugin class and EmailTemplate class.

WebHook Action Plugin

Plugin Name

webhook

Property Description Default value

url

"url" property defines the url of the webhook to invoke.

-

method

"method" property defines the HTTP method used with the webhook url to invoke.

The Plugin will always send the JSON Event processed.

-

timeout

"timeout" property defines the connection timeout (ms) for the webhook url to invoke.

-

More details for WebHook Plugin under link WebHookPlugin class.

File Action Plugin

Plugin Name

file

Property Description Default value

path

Path to store file notifications

System.getProperty("java.io.tmpdir")

More details for File Plugin under link FilePlugin class.

Elasticsearch Action Plugin

Apache Kafka Action Plugin

Experimental Action Plugins

In Hawkular Alerting 1.x some plugins were developed as integration examples with third party systems.

These plugins are not yet migrated into Hawkular Alerting 2.x but its logic is quite straightforward to use it as base for personalized integrations.

Aerogear

Pagerduty

Twilio

Engine Extensions

Engine extensions are listeners that can operate on Data or Events received before the engine process them.

Extensions can implement a variety of use cases where transformation or filtering of incoming Data or Events might be necessary.

Extensions are executed in a pipeline ordered by registration time.

Extensions must implement a DataExtension or EventExtension interface and be registered through the ExtensionsService.

DataExtension interface

public interface DataExtension {

    /**
     * The extension processes the supplied Data and returns Data to be forwarded, if any.
     *
     * @param data The Data to be processed by the extension.
     * @return The set of Data to be forwarded to the next extension, or core engine if this is the final extension.
     */
    TreeSet<Data> processData(TreeSet<Data> data);

}

EventExtension interface

public interface EventExtension {

    /**
     * The extension processes the supplied Events and returns Events to be forwarded, if any.
     *
     * @param events The Events to be processed by the extension.
     * @return The set of Events to be forwarded to the next extension, or core engine if this is the final extension.
     */
    TreeSet<Event> processEvents(TreeSet<Event> events);

}

Events Aggregation Extension

The Events Aggregation Extension allows to scope Sliding Windows on Events and define expressions on aggregated data.

To use this feature a Trigger must have the HawkularExtension tag with value EventsAggregation. It must then define an ExternalCondition with the alerterId set to EventsAggregation, as shown in the example:

{
  "triggers":[
    {
      "trigger":{
        "id": "marketing-scenario",
        "name": "Marketing Scenario",
        "description": "Detect when a customer buys several items in a short period of time",
        "severity": "HIGH",
        "enabled": true,
        "actions":[
          {
            "actionPlugin": "email",
            "actionId": "notify-to-marketing"
          }
        ],
        "tags":{
            "HawkularExtension":"EventsAggregation"
        }
      },
      "conditions":[
        {
          "triggerMode": "FIRING",
          "type": "EXTERNAL",
          "alerterId":"EventsAggregation",
          "dataId": "marketing",
          "expression": "event:groupBy(context.accountId):window(time,10s):having(count > 2)"
        }
      ]
    }
  ],
  "actions":[
    {
      "actionPlugin": "email",
      "actionId": "notify-to-marketing",
      "properties": {
        "to": "marketing@hawkular.org"
      }
    }
  ]
}

All events tagged with HawkularExtension=EventsAggregation will be filtered out and processed asynchronously by the extension applying aggregated rules defined in the ExternalCondition expressions.

Events Aggregation Expressions

An ExternalCondition used for EventsAggregation alerter defines a DSL expression which is parsed internally by the extension into a DRL format understandable by the JBoss Rules CEP engine.

The DSL expression defines Event grouping by fields and additional filtering options:

<expression> ::= "event:groupBy(" <field> ")" [ ":window(" <window> ")" ] [ ":filter(" <filter> ] [ ":having(" <having> ")" ]

<field> ::= [ "tag." | "context." ] <field name>

<window> ::= ( "time," <time_value> | "length," <numeric_value> )

<time_value> ::= [ <numeric_value> "d" ][ <numeric_value> "h" ][ <numeric_value> "m" ][ <numeric_value> "s" ] [ <numeric_value> [ "ms" ]]

<filter> ::= <drools_expression>

<having> ::= <drools_expression>

For example, the expression

event:groupBy(context.accountId):window(time,10s):having(count > 2)

can be described as follows

groupBy(context.accountId)      Group window events by context "accountId" field
window(time,10s)                Define a sliding time window of 10 seconds
having(count > 2)               Define an expression on the grouped events

In other words, this condition will be true, each time that there are more than two events with the same accountId in a 10 seconds window.

The DSL can operate on Events fields, as well as context and tags, as it is shown in the previous example and here:

event:groupBy(tags.accountId):window(time,10s):having(count > 1, count.tags.location > 1)

where

groupBy(context.accountId)                    Group window events by context "accountId" field
window(time,10s)                              Define a sliding time window of 10 seconds
having(count > 1, count.tags.location > 1)    Define an expression on the grouped events

This condition will be true when there are more than 1 events with more than one location tag, so detecting when events for the same accountId happens from different places.

The two previous expressions group all events for the timing window.

We might have scenarios where only specific events should be grouped.

For these cases we can add filters into the expressions like in the following example:

event:groupBy(tags.traceId):filter((category == "Credit Check" && text == "Exceptionally Good") || (category == "Stock Check" && text == "Out of Stock")):having(count > 1, count.tags.accountId == 1)

This expression will group events filtered by an expression

filter(
    (category == "Credit Check" && text == "Exceptionally Good") ||
    (category == "Stock Check" && text == "Out of Stock")
)

Note that this expression doesn’t define an explicit sliding time window, so it will use a default expiration window.

Additional details can be consulted on the JavaDoc of the implementation and examples:

Elasticsearch Integration

Elasticsearch Alerter

The Elasticsearch Alerter will listen for triggers tagged with "Elasticsearch". The Alerter will schedule a periodic query to an Elasticsearch system with the info provided from the tagged trigger’s context. The Alerter will convert Elasticsearch documents into Hawkular Alerting Events and send them into the Alerting engine.

The Elasticsearch Alerter uses the following conventions for trigger tags and context:

Required

trigger.tags["Elasticsearch"] = "<any description>"

An Elasticsearch tag is required for the alerter to detect this trigger will query to an Elasticsearch system.
Value is not necessary, it can be used as a description, it is reserved for future uses.

i.e.
trigger.tags["Elasticsearch"] = "" // Empty value is valid
trigger.tags["Elasticsearch"] = "OpenShift Logging System" // It can be used as description

Required

trigger.context["timestamp"] = "<timestamp field>"

This defines the timestamp field name for the fetched Elasticsearch documents.
The timestamp field is used to fetch documents on an interval basis, without overlap.

If there is not defined a specific pattern under the trigger.context["timestamp.pattern"] it will follow the default patterns:

"yyyy-MM-dd’T’HH:mm:ss.SSSSSSZ"
"yyyy-MM-dd’T’HH:mm:ssZ"

Required

trigger.context["mapping"] = "<mapping_expression>"

A mapping expression defines how to convert an Elasticsearch document into a Hawkular Event.

Mapping expression syntax (BNF):

<mapping_expression> ::= <mapping> | <mapping> "," <mapping_expression>

<mapping> ::= <elasticsearch_field> [ "|" "'" <DEFAULT_VALUE> "'" ] ":" <hawkular_event_field>

<elasticsearch_field> ::= "index" | "id" | <SOURCE_FIELD>

<hawkular_event_field> ::= "id" | "ctime" | "dataSource" | "dataId" | "category" | "text" | "context" | "tags"

A minimum mapping for the "dataId" is required.
If a mapping is not present in an Elasticsearch document it will return an empty value.
It is possible to define a default value for cases when the Elasticsearch field is not present.
Special Elasticsearch metafields "_index" and "_id" are supported under "index" and "id" labels.

i.e.
trigger.context["mapping"] = "level|'INFO':category,@timestamp:ctime,message:text,hostname:dataId,index:tags"

Optional

trigger.context["interval"] = "[0-9]+[smh]"

Defines the time interval between queries of the Elasticsearch system.
If not provided the default is "2m" (two minutes).

i.e.
trigger.context["interval"] = "30s"

performs a new query 30 seconds after completing the previous query, fetching documents generated since that time, using the timestamp field provided in trigger.context["timestamp"].

Optional

trigger.context["timestamp_pattern"] = "<date and time pattern>"

Defines a new time pattern for the trigger.context["timestamp"]. It must follow supported formats of java.text.SimpleDateFormat. If it is not present, it will expect default patterns:

"yyyy-MM-dd’T’HH:mm:ss.SSSSSSZ"
"yyyy-MM-dd’T’HH:mm:ssZ"

Optional

trigger.context["index"] = "<elastic_search_index>"

Defines the index or indexes where the documents will be queried. If not defined the query will search under all defined indexes.
Elasticsearch wildcards are supported.

Optional

trigger.context["filter"] = "<elastic_search_query_filter>"

By default the Elasticsearch Alerter performs a range query over the timestamp field provided in the alerter tag.
This query accepts an additional filter in Elasticsearch format. The final query should be built from:

{"query": {"constant_score": {"filter": {"bool": {"must": [<range_query_on_timestamp>, <elastic_search_query_filter>] }}}}}

Optional

trigger.context["url"]

Elasticsearch url can be defined in several ways in the alerter.
See Elasticsearch Security.

In addition to the methods defined, a trigger can overwrite the url to which it connects.
Url can be a list of valid org.apache.http.HttpHost urls. By default it will point to

trigger.context["url"] = "http://localhost:9200"

i.e. a valid url could be
trigger.context["url"] = "http://host1:9200,http://host2:9200,http://host3:9200"

A complete example of an Elasticsearch trigger:

    {
      "trigger":{
        "id": "trigger-project",
        "name": "Project Logging Trigger",
        "description": "Alert on Project Logging (EFK infrastructure)",
        "severity": "HIGH",
        "enabled": true,
        "tags": {
          "Elasticsearch": "Demo ES instance"
        },
        "context": {
          "timestamp": "@timestamp",
          "interval": "30s",
          "index": "project.logging*",
          "mapping": "type|'Unknown':category,@timestamp:ctime,message:text,hostname:dataId,index:tags"
        },
        "actions":[
          {
            "actionPlugin": "elasticsearch",
            "actionId": "write-full-alert"
          },
          {
            "actionPlugin": "elasticsearch",
            "actionId": "write-partial-alert"
          },
          {
            "actionPlugin": "email",
            "actionId": "email-to-admins"
          }
        ]
      },
      "conditions":[
        {
          "type": "EVENT",
          "dataId": "192.168.122.198",
          "expression": "category == 'response'"
        }
      ]
    }

Elasticsearch Action Plugin

This plugin processes Actions writing Event/Alerts into an Elasticsearch system.

The Elasticsearch plugin supports the following properties:

Property Description Default value

url

See Elasticsearch Security.

Indicate the Elasticsearch server or servers to connect.

http://localhost:9200

index

Indicate the index where the Events/Alerts will be written.

alerts

type

Define the type under the index where the Events/Alerts will be written.

hawkular

transform

Define an optional transformation expression based on JOLT Shiftr format to convert an Event/Alert into a custom JSON format.

i.e.
{"tenantId":"tenant","ctime":"timestamp","dataId":"dataId","context":"context"}

JOLT Shiftr JavaDoc
JOLT Online tool

-

user

See Elasticsearch Security.

Username for Basic credential authentication.

-

pass

See Elasticsearch Security.

Password for Basic credential authentication.

-

forwarded-for

See Elasticsearch Security.

Used for X-Forwarded-For HTTP header.

-

proxy-remote-user

See Elasticsearch Security.

Used for X-Proxy-Remote-User HTTP header

-

token

See Elasticsearch Security.

Used for Bearer HTTP authentication

-

timestamp_pattern

Alerts and Events use timestamps format for ctime,stime,evalTimestamp,dataTimestamp fields.
If present, this property defines the output pattern of these fields when using a declared transform.
It must follow supported formats of java.text.SimpleDateFormat.

-

Examples of Elasticsearch actions:

  "actions":[
    {
      "actionPlugin": "elasticsearch",
      "actionId": "write-full-alert",
      "properties": {
        "index": "alerts_full"
      }
    },
    {
      "actionPlugin": "elasticsearch",
      "actionId": "write-partial-alert",
      "properties": {
        "index": "alerts_summary",
        "timestamp.pattern": "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
        "transform": "{\"tenantId\":\"tenant\",\"ctime\":\"timestamp\",\"text\":\"trigger\",\"context\":{\"interval\":\"fetch-interval\"},\"evalSets\":\"details\"}"
      }
    }
  ]

Elasticsearch Security

Hawkular Elasticsearch integration is supported on Elasticsearch REST client.

There are several ways to configure a secure connection between Hawkular Alerting and Elasticsearch: Basic authentication, certificates or tokens.

Basic authentication can be defined on triggers and actions as described in the previous sections. For certificates or token authentication is recommended to use the following system properties or ENV variables.

System property Environment variable Description

javax.net.ssl.trustStore

-

Location of the Java keystore file containing the collection of CA certificates trusted by this application process (trust store).

See JSSE Reference Guide

javax.net.ssl.trustStorePassword

-

Password to access the private key from the keystore file specified by javax.net.ssl.trustStore.

See JSSE Reference Guide

javax.net.ssl.keyStore

-

Location of the Java keystore file containing an application process’s own certificate and private key.

See JSSE Reference Guide

javax.net.ssl.keyStorePassword

-

Password to access the private key from the keystore file specified by javax.net.ssl.keyStore.

See JSSE Reference Guide

hawkular-alerts.elasticsearch-url

ELASTICSEARCH_URL

Indicate the Elasticsearch server or servers to connect.
Url can be a list of valid org.apache.http.HttpHost urls.

hawkular-alerts.elasticsearch-forwarded-for

ELASTICSEARCH_FORWARDED_FOR

Used for X-Forwarded-For HTTP header.

hawkular-alerts.elasticsearch-token

ELASTICSEARCH_TOKEN

Used for Bearer HTTP authentication.

hawkular-alerts.elasticsearch-proxy-remote-user

ELASTICSEARCH_PROXY_REMOTE_USER

Used for X-Proxy-Remote-User HTTP header.

Prometheus Integration

Prometheus Alerter

The Prometheus Alerter will listen for triggers tagged with "prometheus" tag. The alerter manages Prometheus evaluations and interacts with the Alerts system.

The Prometheus Alerter uses the following conventions for trigger tags and context:

Required

trigger.tags["prometheus"] // the value is ignored

Optional

trigger.context["prometheus.frequency"] = "<seconds between queries to Prometheus, default = 120>"

Note that the same frequency will apply to all Prometheus external conditions on the trigger

Optional

trigger.context["prometheus.url"] = "<url, default = global setting>"

Required

An ExternalCondition must be defined to be processed by the Prometheus External Alerter:
externalcondition.alerterId = "prometheus"
externalcondition.expression = <BooleanExpression>

Prometheus Alerter Endpoints

GET

/hawkular/alerter/status

Status endpoint for Prometheus Alerter Integrations

POST

/hawkular/alerter/notifications

Consume Prometheus Alertmanager WebHook Notifications

Endpoint GET /awkular/alerter/status

Ping status of the Prometheus Alerter

Response

Status codes

Status Code Reason Response Model

200

Prometheus Alerter is up and running

Success

Endpoint POST /awkular/alerter/notifications

Receive a Prometheus Alertmanager WebHook Notification and store it as a Hawkular Event. Expects a format like:

{
  "receiver":"...",
  "status":"...",
  "alerts":[̣
    {
      "status":"...",
      "labels":{},
      "annotations":{},
      "startsAt":"...",
      "endsAt":"...",
      "generatorURL":"..."
    },
    {...},
    {...}
  ],
  "groupLabels":{},
  "commonAnnotations":{},
  "externalUrl":"...",
  "version":"...",
  "groupKey":"..."
}

A Prometheus Notification is mapped into a Hawkular Event with the following convention:

Prometheus Notification Hawkular Event

notification.groupLabels.tenant

event.tenantId

notification.groupLabels.dataId

event.dataId

"prometheus"

event.category

notification.commonAnnotations.description [or]
notification.commonAnnotations.summary [or]
notification.groupLabels.alertName

event.text

notification.groupLabels.alertName

event.context.alertName

notification

event.context.json

notification.groupLabels.dataId -> event.dataId

Body

Required Description Data Type

Yes

Prometheus Notification

Object

Response

Status codes

Status Code Reason Response Model

200

Successfully processed

Success

400

Bad Request/Invalid Parameters

[ApiError]

Apache Kafka Integration

Apache Kakfa Alerter

The Kafka Alerter will listen for triggers tagged with "Kafka" tag. The Alerter will create a consumer to a Kafka Topic with the config provided into the Trigger context. The Alerter will convert Kafka records into Hawkular Data or Events and send them into the Alerting engine.

The Kafka Alerter uses the following conventions for trigger tags and context:

Required

trigger.tags["Kafka"] = "<value reserved for future uses>"

The "Kafka" tag is required for the alerter to detect this trigger will listen to a Kafka topic.
Value is not necessary, it can be used as a description, it is reserved for future use.

i.e.
trigger.tags["Kafka"] = "" // Empty value is valid
trigger.tags["Kafka"] = "OpenShift Kafka System" // It can be used as description

Required

trigger.context["kafka.*"] = "<kafka native properties>"

Kafka Consumer properties are prefixed with "kafka." in the Trigger context.
kafka.key.deserializer and kafka.value.deserializer point to StringDeserializer as default if not present.

i.e.
trigger.context["kafka.bootstrap.servers"] = "localhost:9092"
trigger.context["kafka.group.id"] = "kafka-trigger-group"

See the KafkaConsumer reference for more info

Required

trigger.context["topic"] = "<kafka topic to listen>"

It defines the Kafka Topic on which the trigger will listen. The current version allows only a single topic for Trigger.

Optional

trigger.context["poll_timeout"] = "<kafka consumer poll timeout>"

It defines the poll timeout in ms for the Kafka Consumer. By default it takes 1 second.

Optional

trigger.context["mapping"] = "<mapping_expression>"

By default, Kafka records are directly mapped into a Data object, with the following mapping

Kafka record timestamp → Data timestamp
Kafka record value → Data value
Kafka Topic → Data id (dataId referenced on conditions)

Optionally, a Kafka record can be mapped into an Event. To enable this, a "mapping" expression must be present in Trigger context.

Kafka Alerter expects a json record value.

A mapping expressions defines how to convert an Kafka json record into a Hawkular Event:

<mapping_expression> ::= <mapping> | <mapping> "," <mapping_expression>

<mapping> ::= <kafka_record_field> [ "|" "'" <DEFAULT_VALUE> "'" ] ":" <hawkular_event_field>

<hawkular_event_field> ::= "id" | "ctime" | "dataSource" | "dataId" | "category" | "text" | "context" | "tags"

Optional

trigger.context["timestamp_pattern"] = "<date and time pattern>"

By default, an Event ctime field is mapped directly from a kafka record timestamp value. This can be optionally overwritten to use a json field of the Kafka record value.

This property defines a time pattern to parse ctime field. It must follow supported formats of
java.time.format.DateTimeFormatter.

Apache Kafka Action Plugin

This plugin processes Actions writing Event/Alerts into an Apache Kafka system.

The Apache Kafka plugin supports the following properties:

Property Description Default value

kafka

Used as main prefix for Apache Kafka properties.
All kafka.<property> properties are passed to KafkaConsumer removing the kafka prefix.

Properties can be defined per action.
If not properties defined at action level, it takes default plugin properties.

For these special kafka properties, if not properties defined at action plugin, it will search at system properties level.

-

topic

Used to indicate the topic where the Events/Alerts will be written.

-

transform

Define an optional transformation expression based on JOLT Shiftr format to convert an Event/Alert into a custom JSON format.

i.e.
{"tenantId":"tenant","ctime":"timestamp","dataId":"dataId","context":"context"}

JOLT Shiftr JavaDoc
JOLT Online tool

-

timestamp_pattern

Alerts and Events use timestamps format for ctime,stime,evalTimestamp,dataTimestamp fields.
If present, this property defines the output pattern of these fields when using a declared transform.
It must follow supported formats of java.text.SimpleDateFormat.

-

REST API

Hawkular Alerting supports a robust REST API for managing Triggers, Alerts and Events.

redhatlogo-white

© 2016 | Hawkular is released under Apache License v2.0