How to determine Cloud Hub metrics to be monitored

REDIRECT: This URL has changed to https://support8.gwos.com/gw/gw8/latest/configuration/cloud-monitoring/customizing-metrics
Click the link above if you are not automatically redirected in 10 seconds.

This article focuses on how to determine the metric to be monitored for a Cloud Hub connection. Our example in this article refers to the Cloudera connector, however the process is the same for all of the connectors. Prior to reading this article, you may want to start with a review of the articles Cloud Hub and How to configure Cloud Hub connectors.

You can always override a particular threshold on a particular host by editing it within the Status Summary

Customizing Metrics

Metrics page

The Metrics page, accessible after the server credentials are established, is where you customize the lists of metrics being gathered for a connection. Out of the box, a complete list of metrics is provided for a connectors services. You can customize these metric lists by adding metrics to the list, deleting metrics, as well as creating calculated metric fields called Synthetic metrics.

The Metrics page shows metrics grouped by Cluster, Host, and the connector service collections. The counts of metrics are displayed in the Group bar, summarized by:

  • Total metrics per group
  • Active metrics per group
  • Synthetic metrics per group

You can configure the metrics for any group by clicking on the group bar.

group bar

Configuring metrics

Each row in the grid represents a metric. Metrics can be added, edited or deleted. 

A metric is considered inactive if it is not monitored (the Monitored checkbox is unchecked). 

You can directly edit a metric options, display name, and thresholds in the grid, or use the advanced metric dialog by clicking the Edit button. Edit allows you to configure all properties of a metric using a dialog form. Click the Save button in the top navigation when you are done making changes. 

Leaving the threshold fields blank will disable threshold triggers.

configuring metrics

The grid displays the following fields:

Monitor?Check this if you want to enable monitoring of this metric.
Graph?Check this if you want to graph the values of this metric in time series
Delta?If checked with Delta metric, each metric data point records the number of API calls since the previous data point was written.
If unchecked, it defaults to a gauge metric where each data point represents an instantaneous measurement of a varying value. 
Metric NameThe exact connector metric name or a connector metric expression. This field is read-only. Click the Edit button to modify it.
Display NameOverrides the metric name and stores the metric in GroundWork as a service with this name.
Warning ThresholdMetric value that will trigger a GroundWork Warning alert.
Critical ThresholdMetric value that will trigger a GroundWork Critical alert.

Adding Normal and synthetic metrics

To add new metrics use the Add Normal Metric or Add Synthetic Metric buttons located at the bottom of the list.

Normal metrics

Normal metrics can be:

  • Single Metric Names
  • Computed Metric Names
  • Health Checks metrics
  • Configuration metrics (not monitored)

Single metric names

Figure: Single metric name with display name
single metric names

In this example, we have a Host metric named load_5. This is the unique name of the metric. In the Display field, we renamed this metric to HostLoad5Minute. Renaming metrics is an optional feature. In this case, we renamedload_5 to have a more descriptive metric name displayed in the GroundWork Status viewer. We recommend filling out the description field to describe the metric. This metric represents the Host CPU Load averaged over 5 minutes. We have also setup warning and critical thresholds. Note, for this example, the metric will be monitored and graphed.

We don't use dashes in metric names, only underscores. This is because dashes are not valid variable names in a connector or synthetic expression.

As you type into the Metric Name field, the valid names of metrics available are automatically auto-suggested. This ensures that you use a valid connector metric.

Computed normal metrics

Normal metrics can also be computed. They differ from Synthetic metrics in that the value of the metric is an expression, and it is computed on the connector server, not by Cloud Hub. 

Figure: Normal metrics - computed connector expression
computed normal metrics

In this example, the Metric name is a computed connector expression. The expression includes two connector metrics: physical_memory_used and physical_memory_total. The expression takes the memory used metric, divides it by the total memory metric and multiplies that by 100 to return a computed metric named memory_usage_percent. The AS keyword is required. It defines an alias for the expression to uniquely name the metric:

(physical_memory_used / physical_memory_total) * 100 as memory_usage_percent

When working with computed metrics, make sure to include the AS clause (alias) in your computed expression. Aliases are required on computed metrics. Additionally, the metric Display name must match the alias.

The Metric Format String is an optional C-style formatting string. Here we limit the floating point number to 2 decimal places, and then append a percent sign to the computed metric value:

%.2f%%

See the section below on Example Formatting for more examples.

As you type into the Metric Name field, the valid names of metrics available are automatically auto-suggested. This ensures that you use a valid connector metric name in your expression.

Health check metrics

Health Check metrics are a special type of metric that only report back Health Check status.

These metrics do not have numeric values, but instead have health check statuses that map to GroundWork statuses. 

Figure: Health check metrics 
health check metrics

Health Check metrics are flagged with the Health Check checkbox. As you type into the Metric Name field, the valid names of Health Check metrics available are automatically auto-suggested. This ensures that you use a valid connector Health Check metric.

See the section below on Health Check Status Mappings for the complete list of health check status mappings.

Configuration metrics

Configuration metrics are only used in synthetic computations. They are not reported back to the GroundWork server. To create a configuration metric, simply do not check the Monitor checkbox.

Configuration metrics are used in synthetic calculations, where the value is required, for example to perform a to megabyte or to gigabyte conversion, but you do not want to report back the byte value to the GroundWork server. 

Figure: Configuration metrics 
configuration metrics

The Monitor check box is left unchecked. Note that we still use thresholds, as they are useful in the Synthetic Expression evaluator.

Synthetic metrics

A Synthetic metric is a metric that is computed by Cloud Hub. It has one additional field, expression, that normal metrics do not have. 

Figure: Synthetic metrics 
synthetic metrics

The synthetic metric name is a simple metric name conforming to the GroundWork service name requirements. No spaces are allowed. By convention, we name synthetic metrics with the prefix syn_.
The Metric expression field contains the synthetic expression. In this example, we use a GroundWork function, GW:GB2 to convert the value of the physical_memory_used metric, a value in bytes, to a gigabyte value:

GW:GB2(physical_memory_used)

The Metric Format String is an optional C-style formatting string. Here we limit the floating-point number to 2 decimal places:

%.2f

See the section below on Example Formatting for more examples.

As you type into the Metric Expression field, the valid names of metrics available are automatically auto-suggested. This ensures that you use a valid connector metric name in your expression. Synthetic expressions are limited to the Normal metrics defined for the current group. Additionally, the auto-suggest feature displays all GroundWork functions.

The synthetic expression

This field contains an actual programmable expression that is parsed by Cloud Hub. The expression is made up of:

  • Normal Metrics (not health checks)
  • Expression Operations (addition, subtraction, multiplication, division, parenthesis for grouping)
  • GroundWork Functions
  • Math Functions

Example expression with division and multiplier operators, parenthesis for grouping, and data conversion of integers to double values. The two normal Host metrics are fd_open and fd_max. Note that both normal metrics must be defined for this group. Other synthetic metrics cannot be included in a synthetic expression.

(GW:toDouble(fd_open) / GW:toDouble(fd_max)) * 100.0

The data types of connector metrics are typically floating point (double) values for any measurements. For counters, like the example above, are usually integers or longs. Consult the connector documentation for a complete reference guide to metrics, e.g., Cloudera https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_metrics.html.

Type conversion is supported as GroundWork functions for both floating point and integer numbers. See the section below GroundWork functions for a complete list.

The expression evaluator

The synthetics dialog has an expression evaluator to try out and test your expressions before saving them. The evaluator is displayed at the bottom of the dialog. Each variable in the expression is evaluated based on the check boxes.

Given the expression:

(GW:toDouble(fd_open) / GW:toDouble(fd_max)) * 100.0

there are two variables, fd_open and fd_max. These variables are displayed in the Input Metric Values section of the dialog. There are three ways to evaluate the expression based on: 

  • Warning Threshold – Select the Warning Threshold option then click Evaluate
  • Critical Threshold – Select the Critical Threshold option then click Evaluate
  • Override – Enter values into the Override Value fields, then click Evaluate

In the image below, the fd_max and fd_open metric fields are predefined with warning threshold values of 800 and 400. Clicking Evaluate yields the formatted output: 50.00% used.

Figure: Warning threshold
waring threshold

In the image below, the fd_max and fd_open metric fields are predefined with critical threshold values of 1000 and 600. Clicking Evaluate yields the formatted output: 60.00% 

Figure: Critical threshold
critical threshold

In the image below, the fd_max and fd_open metric fields are entered with values of 2000 and 1500. Clicking Evaluate yields the formatted output: 75.00% used.

Figure: Override values 
override values

Computed examples

Computed (normal) metrics are calculated in the connector server. Here are some examples of Cloudera computed metrics. 

Example 1: Computes the memory usage of physical memory of a host 

Computed Host Metric: (physical_memory_used / physical_memory_total) * 100 as memory_usage_percent 
Format: %.2f%% 
Display Name: memory_usage_percentage 
Warning Threshold: 85 
Critical Threshold: 95 
Description: Host Physical Memory Used Percentage

Example 2: Converts bytes to MB for a host metric 

Computed Host Metric: physical_memory_used / 1048576 as memory_used_mb 
Display Name: memory_used_mb 
Warning Threshold: 8182 
Critical Threshold: 10240 
Description: Host Physical Memory Used in Megabytes

Example 3: Calculates Host CPU Load Percentage over 1 minute 

Computed Host Metric: cpu_user_rate / getHostFact(numCores, 1) * 100 as cpu_rate_user 
Display Name: cpu_rate_user 
Warning Threshold: 75 
Critical Threshold: 90 
Description: Host CPU Load Percentage over 1 Minute

Note that Cloudera currently has the following functions that can be used in a metric computation:

dt(metric) - Derivative with negative values.

The change of the underlying metric expression, per second.

Example:

dt(jvm_gc_count)

dt0(metric) - Derivative where negative values are skipped (useful for dealing with counter resets). The change of the underlying metric expression, per second.

Example:

dt0(jvm_gc_time_ms) / 10

getHostFact(string factName, double defaultValue) - Retrieves a fact about a host.

Example:

dt(total_cpu_user) /   getHostFact(numCores, 2)

This example divides the results of dt(total_cpu_user) by the current number of cores for each host. If the number of cores cannot be determined, the default "2" will be used.

getHostFact currently supports one fact, numCores.

Synthetics examples

Example 1: Cloud Hub computes the physical memory used from bytes to GB with GW function 

Metric Name: syn_gb_memory_used 
Expression: GW:GB2(physical_memory_used) 
Format: %.2f% GB 
Warning Threshold: 8 
Critical Threshold: 10 
Description: Host Memory Used in GB

Example 2: Cloud Hub computes the physical memory used from bytes to GB with GW functions to convert integer values to double values 

Metric Name: syn_fd_usage 
Expression: (GW:toDouble(fd_open) / GW:toDouble(fd_max)) * 100.0 
Format: %.2f%%% 
Warning Threshold: 700 
Critical Threshold: 1000 
Description: Percentage of File Descriptors Used

Example 3: Cloud Hub computes the percentage of memory used with the divideToPercentage function. Note this function returns an integer. 

Metric Name: syn_physical_mem_percent 
Expression: GW:divideToPercentage(physical_memory_used,physical_memory_total) 
Format: %d %% used 
Description: Percentage of Host Memory Used

GroundWork functions

Table: Byte Conversion Functions Using Strict Hexadecimal Values (1024..) 

GW:KB(bytes)Convert bytes to kilobytes
GW:MB(bytes)Convert bytes to megabytes
GW:GB(bytes)Convert bytes to gigabytes
GW:TB(bytes)Convert bytes to terabytes

Table: Byte Conversion Functions Using Decimal Values (1000..) 

GW:KB2(bytes)Convert bytes to kilobytes
GW:MB2(bytes)Convert bytes to megabytes
GW:GB2(bytes)Convert bytes to gigabytes
GW:TB2(bytes)Convert bytes to terabytes

Table: Byte Conversion Functions Using Decimal Values (1000..) 

GW:min(x,y)Returns the minimum value of two numbers
GW:max(x,y)Returns the maximum value of two numbers

Table: Type Conversion 

GW:toDouble(m)Converts a number to double precision
GW:toIntegerConverts a number to an integer
GW:toLongConverts a number to a long integer

GW:scalePercentageUsed

This Function provides percentage usage synthetic values.
Calculates the usage percentage for a given used metric and a corresponding available metric.
Both the used metric and available metric can be scaled by corresponding scale factor parameters.

Example:

scalePercentageUsed(summary.quickStats.overallMemoryUsage,summary.hardware.memorySize, 1.0, 1.0)

Parameters:
used - Represents a 'used' metric value of how much of this resource has been used such as 'overallMemoryUsage'
available - Represents the totality of a resource, such as all memory available
usedScaleFactor - multiply usage parameter by this value, or pass in null to not scale. Passing in 1.0 will also not scale
availableScaleFactor - multiply available parameter by this value, or pass in null to not scale. Passing in 1.0 will also not scale

Returns the percentage usage as an integer 

GW:scalePercentageUnused

This Function provides percentage unused/free synthetic values.
Calculates the unused(free) percentage for a given unused metric and a corresponding available metric.
Both the unused metric and available metric can be scaled by corresponding scale factor parameters.

Example:

scalePercentageUnused(summary.freeSpace,summary.capacity, 1.0, null, true)

Parameters:
unused - Represents a metric reference value of how much of this resource has not be used (free)
available - Represents the totality of a resource, such as all disk space available
usageScaleFactor - multiply usage parameter by this value, or pass in null to not scale. Passing in 1.0 will also not scale
availableScaleFactor - multiply available parameter by this value, or pass in null to not scale. Passing in 1.0 will also not scale

Returns the percentage not used (free) as an integer 

GW:percentageUsed

This Function provides percentage usage synthetic values.
Calculates the usage percentage for a given used metric and a corresponding available metric.

Example:

scalePercentageUsed(summary.quickStats.overallMemoryUsage,summary.hardware.memorySize)

Parameters:
used - Represents a 'used' metric value of how much of this resource has been used such as 'overallMemoryUsage'
available - Represents the totality of a resource, such as all memory available

Returns the percentage usage as an integer 

GW:percentageUnused

This Function provides percentage unused/free synthetic values.
Calculates the unused(free) percentage for a given unused metric and a corresponding available metric.
Both the unused metric and available metric can be scaled by corresponding scale factor parameters.

Example:

scalePercentageUnused(summary.freeSpace, summary.capacity)

Parameters
unused - Represents a metric reference value of how much of this resource has not be used (free)
available - Represents the totality of a resource, such as all disk space available

Returns he percentage not used (free) as an integer 

GW:divideToPercentage

Given two metrics, dividend and divisor divides them and returns a percentage ratio

Example:

GW:divideToPercentage(summary.quickStats.overallMemoryUsage,summary.hardware.memorySize)

Parameters:
dividend - typically a usage or free type metric
divisor - typically a totality type metric, such as total disk space

Returns the percentage ratio as an integer 

GW:toPercentage

Turns a number such as .87 into an integer percentage (87). Also handles rounding of percentages

Example:

GW:toPercentage(summary.quickStats.overallMemoryUsage)

Parameters:
value - the value to be rounded to a full integer percentage

Returns the percentage value as an integer

Math functions

Functions from Java Math library sample:

  • min(n1,n2), max(n1,n2)
  • abs
  • cos, sin, tanexp, log, sort
  • ceil, floor, round
  • rint
  • pow

See docs: https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html

Math functions should be prefixed by:
Math: 
Example:
Math:abs(metric)

Example formatting

The formatting field uses standard C/Java style formatting strings. Typically, you will only be formatting one number, so the formatting strings should be very simple. Data types used are:

  • Integer Numbers %d
  • Floating Point Numbers %f

Example of formatting an integer value 2175:

Format StringOutput
%d2175
%05d02175
%+5d+2175
%,d2,175
%d%% percent2175% percent

Example of formatting a floating point value 3.141593:

Format StringOutput
%f3.141593
%.2f3.14
%2.3f3.141
%.2f%%3.14%
%.2f percent3.14 percent

Normal metric discovery

When entering a metric name in the Metric Name field, metrics are auto-discovered. As you type into the Metric name field, the names of metrics will be auto-suggested. There can be thousands of metrics. The auto-discovery feature can be very useful in finding the right metric. 

Figure: Metric name 
normal metric

Synthetic metric auto suggest

Functions are available in the auto-suggestion list: 

Figure: Metric expression 
metric expression

Also, when entering a synthetic expression, configured metrics will be auto-suggested. As you type into the Metric name field, the names of metrics will be auto-suggested. 

Figure: Auto suggest 
auto suggest

Health check status mappings

Health Check statuses are mapped to GroundWork monitor status values in the Status application based on the tables below:

Table: Cluster Status Mapping 

Connector Cluster StatusMapped to GroundWork Host Status
UNKNOWNUNREACHABLE
NONEUNREACHABLE
STOPPEDSUSPENDED
DOWNDOWN
UNKNOWN_HEALTHWARNING
DISABLED_HEALTHWARNING
CONCERNING_HEALTHWARNING
BAD_HEALTHWARNING
GOOD_HEALTHUP
STARTINGPENDING
STOPPINGDOWN
HISTORY_NOT_AVAILABLEWARNING

Table: Host and Connector Service Status Mapping 

Connector Host StatusMapped to GroundWork Host Status
HISTORY_NOT_AVAILABLEUNREACHABLE
NOT_AVAILABLEUNREACHABLE
DISABLEDSUSPENDED
GOODUP
CONCERNINGWARNING
BADDOWN

Table: Metric Status Mapping 

Connector Metric StatusMapped to GroundWork Service Status
HISTORY_NOT_AVAILABLEUNKNOWN
NOT_AVAILABLEUNKNOWN
DISABLEDPENDING
GOODOK
CONCERNINGWARNING
BADCRITICAL

Related resources