How to determine Cloud Hub metrics to be monitored
Click the link above if you are not automatically redirected in 10 seconds.
This article focuses on how to determine the metric to be monitored for a Cloud Hub connection. Our example in this article refers to the Cloudera connector, however the process is the same for all of the connectors. Prior to reading this article, you may want to start with a review of the articles Cloud Hub and How to configure Cloud Hub connectors.
You can always override a particular threshold on a particular host by editing it within the Status Summary.
Customizing Metrics
Metrics page
The Metrics page, accessible after the server credentials are established, is where you customize the lists of metrics being gathered for a connection. Out of the box, a complete list of metrics is provided for a connectors services. You can customize these metric lists by adding metrics to the list, deleting metrics, as well as creating calculated metric fields called Synthetic metrics.
The Metrics page shows metrics grouped by Cluster, Host, and the connector service collections. The counts of metrics are displayed in the Group bar, summarized by:
- Total metrics per group
- Active metrics per group
- Synthetic metrics per group
You can configure the metrics for any group by clicking on the group bar.
Configuring metrics
Each row in the grid represents a metric. Metrics can be added, edited or deleted.
A metric is considered inactive if it is not monitored (the Monitored checkbox is unchecked).
You can directly edit a metric options, display name, and thresholds in the grid, or use the advanced metric dialog by clicking the Edit button. Edit allows you to configure all properties of a metric using a dialog form. Click the Save button in the top navigation when you are done making changes.
Leaving the threshold fields blank will disable threshold triggers.
The grid displays the following fields:
Monitor? | Check this if you want to enable monitoring of this metric. |
---|---|
Graph? | Check this if you want to graph the values of this metric in time series |
Delta? | If checked with Delta metric, each metric data point records the number of API calls since the previous data point was written. If unchecked, it defaults to a gauge metric where each data point represents an instantaneous measurement of a varying value. |
Metric Name | The exact connector metric name or a connector metric expression. This field is read-only. Click the Edit button to modify it. |
Display Name | Overrides the metric name and stores the metric in GroundWork as a service with this name. |
Warning Threshold | Metric value that will trigger a GroundWork Warning alert. |
Critical Threshold | Metric value that will trigger a GroundWork Critical alert. |
Adding Normal and synthetic metrics
To add new metrics use the Add Normal Metric or Add Synthetic Metric buttons located at the bottom of the list.
Normal metrics
Normal metrics can be:
- Single Metric Names
- Computed Metric Names
- Health Checks metrics
- Configuration metrics (not monitored)
Single metric names
Figure: Single metric name with display name
In this example, we have a Host metric named load_5. This is the unique name of the metric. In the Display field, we renamed this metric to HostLoad5Minute. Renaming metrics is an optional feature. In this case, we renamedload_5 to have a more descriptive metric name displayed in the GroundWork Status viewer. We recommend filling out the description field to describe the metric. This metric represents the Host CPU Load averaged over 5 minutes. We have also setup warning and critical thresholds. Note, for this example, the metric will be monitored and graphed.
We don't use dashes in metric names, only underscores. This is because dashes are not valid variable names in a connector or synthetic expression.
As you type into the Metric Name field, the valid names of metrics available are automatically auto-suggested. This ensures that you use a valid connector metric.
Computed normal metrics
Normal metrics can also be computed. They differ from Synthetic metrics in that the value of the metric is an expression, and it is computed on the connector server, not by Cloud Hub.
Figure: Normal metrics - computed connector expression
In this example, the Metric name is a computed connector expression. The expression includes two connector metrics: physical_memory_used and physical_memory_total. The expression takes the memory used metric, divides it by the total memory metric and multiplies that by 100 to return a computed metric named memory_usage_percent. The AS keyword is required. It defines an alias for the expression to uniquely name the metric:
(physical_memory_used / physical_memory_total) * 100 as memory_usage_percent
When working with computed metrics, make sure to include the AS clause (alias) in your computed expression. Aliases are required on computed metrics. Additionally, the metric Display name must match the alias.
The Metric Format String is an optional C-style formatting string. Here we limit the floating point number to 2 decimal places, and then append a percent sign to the computed metric value:
%.2f%%
See the section below on Example Formatting for more examples.
As you type into the Metric Name field, the valid names of metrics available are automatically auto-suggested. This ensures that you use a valid connector metric name in your expression.
Health check metrics
Health Check metrics are a special type of metric that only report back Health Check status.
These metrics do not have numeric values, but instead have health check statuses that map to GroundWork statuses.
Figure: Health check metrics
Health Check metrics are flagged with the Health Check checkbox. As you type into the Metric Name field, the valid names of Health Check metrics available are automatically auto-suggested. This ensures that you use a valid connector Health Check metric.
See the section below on Health Check Status Mappings for the complete list of health check status mappings.
Configuration metrics
Configuration metrics are only used in synthetic computations. They are not reported back to the GroundWork server. To create a configuration metric, simply do not check the Monitor checkbox.
Configuration metrics are used in synthetic calculations, where the value is required, for example to perform a to megabyte or to gigabyte conversion, but you do not want to report back the byte value to the GroundWork server.
Figure: Configuration metrics
The Monitor check box is left unchecked. Note that we still use thresholds, as they are useful in the Synthetic Expression evaluator.
Synthetic metrics
A Synthetic metric is a metric that is computed by Cloud Hub. It has one additional field, expression, that normal metrics do not have.
Figure: Synthetic metrics
The synthetic metric name is a simple metric name conforming to the GroundWork service name requirements. No spaces are allowed. By convention, we name synthetic metrics with the prefix syn_.
The Metric expression field contains the synthetic expression. In this example, we use a GroundWork function, GW:GB2 to convert the value of the physical_memory_used metric, a value in bytes, to a gigabyte value:
GW:GB2(physical_memory_used)
The Metric Format String is an optional C-style formatting string. Here we limit the floating-point number to 2 decimal places:
%.2f
See the section below on Example Formatting for more examples.
As you type into the Metric Expression field, the valid names of metrics available are automatically auto-suggested. This ensures that you use a valid connector metric name in your expression. Synthetic expressions are limited to the Normal metrics defined for the current group. Additionally, the auto-suggest feature displays all GroundWork functions.
The synthetic expression
This field contains an actual programmable expression that is parsed by Cloud Hub. The expression is made up of:
- Normal Metrics (not health checks)
- Expression Operations (addition, subtraction, multiplication, division, parenthesis for grouping)
- GroundWork Functions
- Math Functions
Example expression with division and multiplier operators, parenthesis for grouping, and data conversion of integers to double values. The two normal Host metrics are fd_open and fd_max. Note that both normal metrics must be defined for this group. Other synthetic metrics cannot be included in a synthetic expression.
(GW:toDouble(fd_open) / GW:toDouble(fd_max)) * 100.0
The data types of connector metrics are typically floating point (double) values for any measurements. For counters, like the example above, are usually integers or longs. Consult the connector documentation for a complete reference guide to metrics, e.g., Cloudera https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_metrics.html.
Type conversion is supported as GroundWork functions for both floating point and integer numbers. See the section below GroundWork functions for a complete list.
The expression evaluator
The synthetics dialog has an expression evaluator to try out and test your expressions before saving them. The evaluator is displayed at the bottom of the dialog. Each variable in the expression is evaluated based on the check boxes.
Given the expression:
(GW:toDouble(fd_open) / GW:toDouble(fd_max)) * 100.0
there are two variables, fd_open and fd_max. These variables are displayed in the Input Metric Values section of the dialog. There are three ways to evaluate the expression based on:
- Warning Threshold – Select the Warning Threshold option then click Evaluate
- Critical Threshold – Select the Critical Threshold option then click Evaluate
- Override – Enter values into the Override Value fields, then click Evaluate
In the image below, the fd_max and fd_open metric fields are predefined with warning threshold values of 800 and 400. Clicking Evaluate yields the formatted output: 50.00% used.
Figure: Warning threshold
In the image below, the fd_max and fd_open metric fields are predefined with critical threshold values of 1000 and 600. Clicking Evaluate yields the formatted output: 60.00%
Figure: Critical threshold
In the image below, the fd_max and fd_open metric fields are entered with values of 2000 and 1500. Clicking Evaluate yields the formatted output: 75.00% used.
Figure: Override values
Computed examples
Computed (normal) metrics are calculated in the connector server. Here are some examples of Cloudera computed metrics.
Example 1: Computes the memory usage of physical memory of a host
Computed Host Metric: (physical_memory_used / physical_memory_total) * 100 as memory_usage_percent Format: %.2f%% Display Name: memory_usage_percentage Warning Threshold: 85 Critical Threshold: 95 Description: Host Physical Memory Used Percentage
Example 2: Converts bytes to MB for a host metric
Computed Host Metric: physical_memory_used / 1048576 as memory_used_mb Display Name: memory_used_mb Warning Threshold: 8182 Critical Threshold: 10240 Description: Host Physical Memory Used in Megabytes
Example 3: Calculates Host CPU Load Percentage over 1 minute
Computed Host Metric: cpu_user_rate / getHostFact(numCores, 1) * 100 as cpu_rate_user Display Name: cpu_rate_user Warning Threshold: 75 Critical Threshold: 90 Description: Host CPU Load Percentage over 1 Minute
Note that Cloudera currently has the following functions that can be used in a metric computation:
dt(metric) - Derivative with negative values.
The change of the underlying metric expression, per second.
Example:
dt(jvm_gc_count)
dt0(metric) - Derivative where negative values are skipped (useful for dealing with counter resets). The change of the underlying metric expression, per second.
Example:
dt0(jvm_gc_time_ms) / 10
getHostFact(string factName, double defaultValue) - Retrieves a fact about a host.
Example:
dt(total_cpu_user) / getHostFact(numCores, 2)
This example divides the results of dt(total_cpu_user) by the current number of cores for each host. If the number of cores cannot be determined, the default "2" will be used.
getHostFact currently supports one fact, numCores.
Synthetics examples
Example 1: Cloud Hub computes the physical memory used from bytes to GB with GW function
Metric Name: syn_gb_memory_used Expression: GW:GB2(physical_memory_used) Format: %.2f% GB Warning Threshold: 8 Critical Threshold: 10 Description: Host Memory Used in GB
Example 2: Cloud Hub computes the physical memory used from bytes to GB with GW functions to convert integer values to double values
Metric Name: syn_fd_usage Expression: (GW:toDouble(fd_open) / GW:toDouble(fd_max)) * 100.0 Format: %.2f%%% Warning Threshold: 700 Critical Threshold: 1000 Description: Percentage of File Descriptors Used
Example 3: Cloud Hub computes the percentage of memory used with the divideToPercentage function. Note this function returns an integer.
Metric Name: syn_physical_mem_percent Expression: GW:divideToPercentage(physical_memory_used,physical_memory_total) Format: %d %% used Description: Percentage of Host Memory Used
GroundWork functions
Table: Byte Conversion Functions Using Strict Hexadecimal Values (1024..)
GW:KB(bytes) | Convert bytes to kilobytes |
---|---|
GW:MB(bytes) | Convert bytes to megabytes |
GW:GB(bytes) | Convert bytes to gigabytes |
GW:TB(bytes) | Convert bytes to terabytes |
Table: Byte Conversion Functions Using Decimal Values (1000..)
GW:KB2(bytes) | Convert bytes to kilobytes |
---|---|
GW:MB2(bytes) | Convert bytes to megabytes |
GW:GB2(bytes) | Convert bytes to gigabytes |
GW:TB2(bytes) | Convert bytes to terabytes |
Table: Byte Conversion Functions Using Decimal Values (1000..)
GW:min(x,y) | Returns the minimum value of two numbers |
---|---|
GW:max(x,y) | Returns the maximum value of two numbers |
Table: Type Conversion
GW:toDouble(m) | Converts a number to double precision |
---|---|
GW:toInteger | Converts a number to an integer |
GW:toLong | Converts a number to a long integer |
GW:scalePercentageUsed
This Function provides percentage usage synthetic values.
Calculates the usage percentage for a given used metric and a corresponding available metric.
Both the used metric and available metric can be scaled by corresponding scale factor parameters.
Example:
scalePercentageUsed(summary.quickStats.overallMemoryUsage,summary.hardware.memorySize, 1.0, 1.0)
Parameters:
used - Represents a 'used' metric value of how much of this resource has been used such as 'overallMemoryUsage'
available - Represents the totality of a resource, such as all memory available
usedScaleFactor - multiply usage parameter by this value, or pass in null to not scale. Passing in 1.0 will also not scale
availableScaleFactor - multiply available parameter by this value, or pass in null to not scale. Passing in 1.0 will also not scale
Returns the percentage usage as an integer
GW:scalePercentageUnused
This Function provides percentage unused/free synthetic values.
Calculates the unused(free) percentage for a given unused metric and a corresponding available metric.
Both the unused metric and available metric can be scaled by corresponding scale factor parameters.
Example:
scalePercentageUnused(summary.freeSpace,summary.capacity, 1.0, null, true)
Parameters:
unused - Represents a metric reference value of how much of this resource has not be used (free)
available - Represents the totality of a resource, such as all disk space available
usageScaleFactor - multiply usage parameter by this value, or pass in null to not scale. Passing in 1.0 will also not scale
availableScaleFactor - multiply available parameter by this value, or pass in null to not scale. Passing in 1.0 will also not scale
Returns the percentage not used (free) as an integer
GW:percentageUsed
This Function provides percentage usage synthetic values.
Calculates the usage percentage for a given used metric and a corresponding available metric.
Example:
scalePercentageUsed(summary.quickStats.overallMemoryUsage,summary.hardware.memorySize)
Parameters:
used - Represents a 'used' metric value of how much of this resource has been used such as 'overallMemoryUsage'
available - Represents the totality of a resource, such as all memory available
Returns the percentage usage as an integer
GW:percentageUnused
This Function provides percentage unused/free synthetic values.
Calculates the unused(free) percentage for a given unused metric and a corresponding available metric.
Both the unused metric and available metric can be scaled by corresponding scale factor parameters.
Example:
scalePercentageUnused(summary.freeSpace, summary.capacity)
Parameters
unused - Represents a metric reference value of how much of this resource has not be used (free)
available - Represents the totality of a resource, such as all disk space available
Returns he percentage not used (free) as an integer
GW:divideToPercentage
Given two metrics, dividend and divisor divides them and returns a percentage ratio
Example:
GW:divideToPercentage(summary.quickStats.overallMemoryUsage,summary.hardware.memorySize)
Parameters:
dividend - typically a usage or free type metric
divisor - typically a totality type metric, such as total disk space
Returns the percentage ratio as an integer
GW:toPercentage
Turns a number such as .87 into an integer percentage (87). Also handles rounding of percentages
Example:
GW:toPercentage(summary.quickStats.overallMemoryUsage)
Parameters:
value - the value to be rounded to a full integer percentage
Returns the percentage value as an integer
Math functions
Functions from Java Math library sample:
- min(n1,n2), max(n1,n2)
- abs
- cos, sin, tanexp, log, sort
- ceil, floor, round
- rint
- pow
See docs: https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html
Math functions should be prefixed by:
Math:
Example:
Math:abs(metric)
Example formatting
The formatting field uses standard C/Java style formatting strings. Typically, you will only be formatting one number, so the formatting strings should be very simple. Data types used are:
- Integer Numbers %d
- Floating Point Numbers %f
Example of formatting an integer value 2175:
Format String | Output |
---|---|
%d | 2175 |
%05d | 02175 |
%+5d | +2175 |
%,d | 2,175 |
%d%% percent | 2175% percent |
Example of formatting a floating point value 3.141593:
Format String | Output |
---|---|
%f | 3.141593 |
%.2f | 3.14 |
%2.3f | 3.141 |
%.2f%% | 3.14% |
%.2f percent | 3.14 percent |
Normal metric discovery
When entering a metric name in the Metric Name field, metrics are auto-discovered. As you type into the Metric name field, the names of metrics will be auto-suggested. There can be thousands of metrics. The auto-discovery feature can be very useful in finding the right metric.
Figure: Metric name
Synthetic metric auto suggest
Functions are available in the auto-suggestion list:
Figure: Metric expression
Also, when entering a synthetic expression, configured metrics will be auto-suggested. As you type into the Metric name field, the names of metrics will be auto-suggested.
Figure: Auto suggest
Health check status mappings
Health Check statuses are mapped to GroundWork monitor status values in the Status application based on the tables below:
Table: Cluster Status Mapping
Connector Cluster Status | Mapped to GroundWork Host Status |
---|---|
UNKNOWN | UNREACHABLE |
NONE | UNREACHABLE |
STOPPED | SUSPENDED |
DOWN | DOWN |
UNKNOWN_HEALTH | WARNING |
DISABLED_HEALTH | WARNING |
CONCERNING_HEALTH | WARNING |
BAD_HEALTH | WARNING |
GOOD_HEALTH | UP |
STARTING | PENDING |
STOPPING | DOWN |
HISTORY_NOT_AVAILABLE | WARNING |
Table: Host and Connector Service Status Mapping
Connector Host Status | Mapped to GroundWork Host Status |
---|---|
HISTORY_NOT_AVAILABLE | UNREACHABLE |
NOT_AVAILABLE | UNREACHABLE |
DISABLED | SUSPENDED |
GOOD | UP |
CONCERNING | WARNING |
BAD | DOWN |
Table: Metric Status Mapping
Connector Metric Status | Mapped to GroundWork Service Status |
---|---|
HISTORY_NOT_AVAILABLE | UNKNOWN |
NOT_AVAILABLE | UNKNOWN |
DISABLED | PENDING |
GOOD | OK |
CONCERNING | WARNING |
BAD | CRITICAL |
Related resources
-
Cloud Hub (Documentation)
-
Cloud Hub troubleshooting (Knowledge Base)
-
Ownership options (Documentation)
-
Transit Connection Generator (TCG) (Documentation)