Notifications and Downtime

Overview

A monitoring system is more useful if it can send messages when equipment being monitored fails. Notifications and escalations are about connecting the alerts that occur when monitors detect failure, or over limit conditions, to effective communications with configured contacts.

If a threshold value is breached for a monitored attribute an event is generated and you can create notifications to warn you about these types of events. The decision to send notifications is made in the service check and host check logic and happens when a hard state change occurs or when a host or service remains in a hard non-OK state. Each host and service definition has a contact group made up of contacts who receive notifications, and the contact filters determine if a contact is notified. Regarding notification methods, you can be notified of problems and recoveries pretty much anyway you want including by cellphone, email, instant message, or an audio alert.

GroundWork Monitor makes full use of the notification and escalation features of Nagios, however it It is important to note that Nagios notifications are only effective for Nagios monitored objects and GroundWork includes monitoring that goes beyond Nagios. GroundWork has integrated the Notification Manager application (NoMa) which may be a preferable option to configure system notifications as we needed a way to send alerts when using GDMA, Cloud Hub discovered elements, and also for syslog and trap externals to Nagios. NoMa notifications and escalations work for both Nagios and non-Nagios configured elements.

Scheduling of downtime can be very useful during system maintenance as it suppresses notifications to those entities in downtime. The GroundWork Monitor Downtimes tool is used to manage the scheduling of downtime for all monitored entities including hosts, services, host groups, and service groups for regular and recurring (e.g., daily, weekly, monthly, yearly) downtime.

During the specified downtime, alert notifications will not be sent out about the monitored entities. This is useful in the event of taking a server down for an upgrade or maintenance, etc. Scheduling downtime also avoids alarm fatigue, provides more accurate data for SLA reporting, and reinforces change control discipline. When the scheduled downtime expires, notifications for the hosts and services will resume as normal. Scheduled downtimes are preserved across program shutdowns and restarts.

NoMa subsystem and schema

GroundWork has integrated a free standing notification and escalation subsystem which no longer requires the use of Nagios to alert contacts and contact groups. Cloud Hub bypasses Nagios for several reasons, one of which is to permit changes to the server virtual infrastructure to be made fluidly and automatically with alerts no longer needing to be processed by a batch configuration commit process. Secondly, bypassing Nagios for alerts, avoids processing overhead associated with Nagios that in some scenarios avoids a capacity limitation.

A free standing notification subsystem also permits changes to notification and escalation schedules to be made in run time by roles having lower privileges than the system administration role which makes the system more flexible to operate. NoMa also incorporates typical business rules and conditions into its user interface that are easier to understand and configure. 

Notifications and escalations using Nagios remain available for configuration and maintenance as shown by the dotted line in the image below. This permits customers that have made investments in time or scripting for Nagios alerts to continue to use these methods. Alternatively all versions of GroundWork since release 6.7 can be reconfigured so that Nagios alerts are sent directly to NoMa which is configured to handle them, in order to gain the benefits previously described.

As shown below, the alert flow coming from Cloud Hub and other non-Nagios discovered elements is processed by unique feeders that send alerts to the RESTful API for the Foundation database. In turn, NoMa subscribes to alert messages via the REST API to perform notifications and if needed escalations. This figure shows the relationship between Nagios, NoMa, and the GroundWork data management subsystem.

noma subsystem and schema

By using the NoMa front-end, hosts and services can be assigned to notifications. When NoMa receives notifications either directly from Nagios or indirectly through the GroundWork REST API, it searches for matching host and service definitions in its database. If a matching setup (configured notification) has been found, escalation levels, receivers and methods are determined and notifications will be sent. The following diagram from the NoMa documentation has been modified to reflect the way the product is employed within GroundWork Monitor.

noma and groundwork


Related articles