Background of Auto Setup
This page provides detailed information about the inner workings of GDMA Auto Setup.
Auto Setup Sequencing
This section describes the actions taken for Auto-Setup, in successively greater detail.
Top Level Actions
GDMA Auto-Setup occurs in multiple phases:
- Manual construction of auto-setup files (instructions and trigger files) on the GroundWork server. You as an administrator keep your master copies of these files in your own repository, outside of the GroundWork product. And of course, you back up that repository on a regular basis.
- Installation of instructions and trigger files where the GDMA client can find them and retrieve them at the start of its next polling cycle. Said installation is generally done via the
autosetup
tool, which handles all the details. - A pass of probing resources on the GDMA client ("auto-discovery") according to the given instructions, with the results sent back to the server.
- Analysis of the resource-discovery results, and possible reconfiguration of the monitoring setup for that GDMA host.
- Automatic re-building of service externals for that GDMA host when its configuration has changed.
- The GDMA client begins monitoring immediately with updated externals.
- Nagios will not recognize changes to monitored inventory on the GDMA client until the next time a Monarch Commit is run on the server.
Client-server Interactions
Here's what happens at the level of specific actions taken by the GDMA client and the GroundWork server. Since discovery passes can occur throughout the lifetime of a GDMA client-daemon run, not just immediately after installation, we do not restrict our attention here to just the actions taken in the first round of operation.
- On initial startup of the GDMA poller:
- The poller wakes up and reads its local config files.
- If auto-setup has been configured for this machine (via the
Enable_Auto_Setup
directive), the poller runs a pass of auto-setup, including potentially downloading trigger and instructions files from the server, running discovery, sending results to the server, and getting back a response. Actually running discovery only happens if it is warranted at this time. All of that action is designed to run early, so when the poller downloads externals from the server, they will accurately reflect from the get-go exactly what this client should be monitoring. - The poller tries to fetch externals from the server, both to obtain them if it does not already have them in hand, and to obtain an updated copy if they have been updated on the server since the poller last ran.
- The poller runs a cycle of executing service checks, based on whatever is present in the externals file (if it has one in hand).
- On subsequent polling cycles, in general:
- The poller runs a cycle of executing service checks, based on whatever is present in the externals file (if it has one in hand).
- On every
ConfigFile_Pull_Cycle
polling cycles (typically, this directive is set to 1, so these actions occur all the time):- If auto-setup has been configured for this machine, the poller runs a pass of auto-setup, as it would have when it first woke up. This may or may not result in a pass of discovery being run.
- The poller tries to fetch externals from the server, to make sure its local copy is up-to-date.
- The poller runs a cycle of executing service checks, based on whatever is present in the externals file (if it has one in hand).
Each pass of auto-setup, whether run at daemon startup or in some later polling cycle, looks like this:
- The GDMA client will reach over to the server, looking for an auto-discovery trigger file and an auto-discovery instructions file for this host.
- If the client finds those files for this host on the server, it will download them and check whether a pass of discovery is warranted at this time.
- If the client has local copies of those two files, the trigger timestamp is later than the instructions timestamp, and no discovery has been run since the trigger timestamp, a pass of discovery is now warranted.
- Otherwise, discovery is skipped at this time, and there is nothing more to do for auto-setup in this polling cycle.
- If a pass of discovery is warranted, run auto-discovery actions as described in the trigger and instructions files.
- The auto-discovery processing will result in some set of auto-discovery results. Those results will be sent to the server as part of an auto-discovery results packet.
- The server responds to the auto-discovery packet by:
- Analyzing the packet to ensure that it is valid.
- If so instructed in the trigger, configuring the GDMA client in Monarch. That may happen as a dry run, which is immediately reverted. Or it may happen as a live action, wherein changes are permanently stored in the configuration data.
- Applying a host profile and possibly service profiles, as listed by the GDMA client in the auto-discovery results.
- Customizing and applying individual services, and creating service instances as necessary, to reflect details of the auto-discovery results.
- If permanent changes were made to the configuration database, building externals for the GDMA client, placing them in a standard location on the server known to the GDMA client.
- Returning a success/failure indicator to the GDMA client.
- If permanent changes were made to the configuration database, returning as well the hostname by which the GDMA client is now registered.
- The GDMA client records the fact that a pass of discovery has been run, and that will block further use of the local copy of the trigger file.
- If permanent configuration succeeded, the GDMA client will adopt the hostname sent back by the server and use it for all further interaction with the server, both in fetching files from the server and in reporting check results. The hostname will be reflected as well in the logfile names.
Key to all of this processing is the auto-discovery instructions file, which must be defined by the customer to reflect the kinds of resources that might be present in their own infrastructure and need monitoring. In brief, it tells the GDMA client what resources to look for and how to look for them, what host profiles, service profiles, and services to apply if they are found, and how all of that setup should be customized based on the details of the discovery matching on that particular host.
How Discovery Works
Let's focus in on the middle part of Auto-Setup sequencing. Here is the sequence of actions that occur when auto-discovery on a GDMA client processes an instructions file. In the Configuration Reference for Auto Setup page, we describe the individual sensor directives that we mention here.
- The instructions file
format_version
is checked, to verify that it is a version that the code knows how to handle. - Each sensor definition is validated for correct construction.
- If a sensor definition is not enabled, all further processing of that sensor is skipped.
- The sensor
type
is probed or scanned, with limitations imposed as selected by theresource
directive. - The sensor
pattern
is applied to whatever the filtered scanning produces, resulting in some number of matches. - If no
pattern
matches are found, this sensor contributes nothing to the successful discovery results. This does not fail the discovery as a whole. - If some
pattern
matches are found, the number of matches is compared against the chosencardinality
of the sensor results.- If there is more than one match and the
cardinality
issingle
, the sensor fails and the discovery as a whole fails. - If there is more than one match and the
cardinality
isfirst
, the sensor succeeds and only the first match is processed further. Note that order of processing is often not guaranteed, so what constitutes the first match may be somewhat arbitrary, depending on the sensortype
,resource
filtering, andpattern
matching. - If the
cardinality
ismultiple
and somepattern
matches are found, a separate service instance will be created in the discovery results for each match.
- If there is more than one match and the
- Each accepted
pattern
match is processed in turn. Captured strings are first run through however manytransliteration
transforms are specified, in order as presented in the sensor definition, and then throughsanitization
if that is specified. This creates corresponding$MATCHEDn$
and$SANITIZEDn$
macro values, one such pair from each captured string, from the rawpattern
matching and aftersanitization
.n
is a decimal number that refers to then
th captured string from one match of thepattern
; it is not a count of how many times the pattern has been matched so far. - Processed match results are substituted for the corresponding macro references in the remaining sensor directive values:
host_profile
,service_profile
,service
,check_command
,command_arguments
,externals_arguments
,instance_suffix
,instance_cmd_args
, andinstance_ext_args
. In the case wherecardinality
ismultiple
, you will need to haveinstance_suffix
defined in some way that depends on such macro references, such that each generated service instance will have a unique name. Also especially in that case, whatever forms of generated arguments your sensor produces will need to be instance-specific. - Discovery results are sent to the server, using an authenticated connection. The server stores those results, and they can be viewed on the server using the
autosetup
tool. - The server analyzes the received discovery results, validating not just individual sensor results but also ensuring that there are no contradictions between multiple service instances and between the configurations derived from multiple sensors. The analysis is also stored on the server, and it can be viewed using the
autosetup
tool. - The validated discovery results are applied to the Monarch database, adding objects and populating fields as appropriate. Collisions with existing objects which do not change any of the fields are ignored. Collisions which do change some fields of the affected objects are raised as errors.
- Database changes are rolled back upon error (e.g., missing base object) or user instruction (for a dry run).
- Conversely, if all went well and the user wanted the configuration changes to go live, the database changes are committed.
- If database changes were attempted, then whether or not they were persisted, the individual changes are appended to the discovery analysis, viewable using the
autosetup
tool. This provides the necessary feedback to humans for dry runs, and forensic information for successful or failed live-action (production) runs. - If persistent configuration changes were made, externals are built for this one GDMA client machine.
Construction of auto-setup Instructions Files
There are two types of declarations in the files used to configure Auto-Setup: global directives, and sensor definitions. Currently, all of the global directives appear in the trigger file, except for format_version
, which appears in the instructions file. All sensor definitions appear in the instructions file.
A future version of Auto-Setup might allow certain sensor directives to be specified outside of sensor definitions, to provide global defaults. A typical example would be providing a default sanitization
value.
Sensors are at the heart of auto-discovery. Successful discovery of a given resource involves specifying the following in a sensor definition:
- a resource type (e.g., file existence, open port)
- resource path(s) (this may or may not be required, depending on the resource type)
- a resource match pattern
- what host profile, service profile, or service to configure upon a match
- what resource-matching details to include in the discovery results for this resource
All of that information is specified by sensor directives, which are bundled into separate sensor definitions, one per resource that you wish to potentially match. Reference material on those directives can be found in the Quick Reference for Auto Setup and Configuration Reference for Auto Setup pages. Here, we will concentrate on the underlying conceptual material.
Sensor Types and Mechanisms
In this section, we enumerate all the possible auto-discovery sensor mechanisms we support, and the type of pattern matching we support for each mechanism. The list here is exhaustive, but the catalog of individual sensors and their respective details is presented instead in the Catalog of Supported Sensor Types section. The sensor mechanisms themselves fall into three major categories: fixed, static, and dynamic.
- Fixed sensors represent basic data about the machine you are running on, irrespective of any services you may or may not run on the machine. These sensors are necessarily available on all platforms, regardless of whether any particular services are running or even installed or configured. Typically, fixed sensors would be used to establish the host profile to be used, rather than particular service profiles or services. But to keep things general, we support any of those kinds of objects as the configuration target of a fixed-sensor match.
- Static sensors probe artifacts like files that should be stable and available even if a particular service happens not to be running at the time when the sensor is run. Using static sensors is generally preferred over using dynamic sensors because otherwise we cannot tell the difference between a service not being appropriate for monitoring on this machine, and that service just happening to be down at the moment auto-discovery is run.
- Dynamic sensors probe artifacts like processes and port numbers that are only available when the services to be sensed are currently running. Dynamic sensors may be useful in situations where static data is insufficient to tell precisely what services should be monitored, but their use demands that other administrative procedures be in place to correct any errors of omission that might arise if the desired services happen to be down at the time of auto-discovery.
The specific supported sensor mechanisms are:
Fixed Sensors
These comprise:
- OS-related system parameters, typically matched against a limited set of fully known alternatives:
- OS type (
windows
,linux
,solaris
, oraix
, reflecting the various operating systems on which we support GDMA) - OS version (the specific string format and content depends on the OS type and the OS manufacturer's conventions over time)
- OS type (
- basic machine-architecture determination, matched against a limited set of fully known alternatives:
- OS bit-width (
32
or64
) - platform architecture, reflecting the CPU processor:
for Windows:intel
(might also bearm
in the future)
for Linux:intel
(orpowerpc
, in rare instances)
for Solaris:sparc
orintel
for AIX:powerpc
- OS bit-width (
There are, of course, many sub-types of platform architectures, such as those that support different variations of multi-media instruction sets. We have not historically had any need to distinguish platforms at that level, so we will only evolve the system to support auto-discovery of such detail if it is actually seen to be useful. Similarly, there is no support for the time being to discover what type of GPU or other accelerator hardware might be available, or what type of storage (network; local spinning disk; local SSD) is available.
Static Sensors
These comprise:
- existence of particular files, according to a given path pattern
- existence of particular symlinks, according to a given path pattern
- existence of particular directories, according to a given path pattern
- existence of particular mounted filesystems, according to a given mount-point path (not device name) pattern
- particular content of particular files, according to a given simple line-by-line search pattern (similar to line-by-line matching in
egrep
, but implemented using the full expressiveness of Perl regex pattern matching)
The set of static-sensor capabilities may evolve over time, depending on what customers report is required for practical use. For instance, you might need to parse complex data structures in a config file, rather than just using line-by-line matching, to ensure that each successful match only occurs within the appropriate surrounding context. That could happen, for instance, if you had a companion per-instance enabling flag in the config file which might be distinct from the data that specifies the details of the matched data useful in determining service-instance configuration.
Looking ahead to when you learn about cardinality
, note that even in the line-by-line scenario, a <service>
sensor definition of type file_content
may be declared with cardinality = "multiple"
.
Dynamic Sensors
These comprise:
- existence of particular running services on the GDMA client machine, identified by their platform-specific service name (and if necessary, their platform-specific service-instance data) instead of by their process-command-line data
- existence of particular running processes on the GDMA client machine, according to a given process-command-line pattern matched against the full command including all the process arguments (subject to unavoidable truncation on some platforms)
- existence of particular already-open ports in a listening state on the GDMA client machine
- existence of particular open sockets named in the filesystem on the GDMA client machine
Sensor Match Cardinality and Related Directives
A successful <service>
sensor may match multiple instances of the pattern
on the GDMA client. If so specified in the sensor definition, each successful match will generate a separate collection of auto-discovery data sent to the GroundWork Monitor server for analysis and corresponding instantiation of services. A given sensor definition should only specify at most one service_profile
or service
, so all of the corresponding services associated with that target configuration object can be treated in exactly the same manner when it comes to generating multiple instances of them within Monarch.
The default action is to expect only one copy of the resource to be found. If other copies are found, the discovery will fail. This decision corresponds to a
cardinality = "single"
cardinality = "first"
httpd
process running to implement what is effectively just one copy of the Apache Web Server, and you need only one Groundwork service to represent monitoring of the entire collection of processes. Conversely, you may instead specifycardinality = "multiple"
turbine
process, each one checking the state of a separate generator in a wind farm. In that case, some unique identifier for each turbine ought to be matched and used as part of the service-check command-line arguments, and that identifier will originate in the auto-discovery results. See below for how such identifiers are percolated into customization of service externals and thus ultimately, the executed-service-check arguments.
There are some corner cases to consider. For example, note that Apache Web Server spawns multiple httpd
processes for a single instance of the web server. So if you actually had multiple copies of Apache running on the same machine (listening on different ports), you would be tempted to specify cardinality = "multiple"
. But since each copy will run multiple copies of the httpd
process, you don't want that cardinality to mean that every individual process will be treated as a separate copy of Apache. A solution in this case could be to know something in advance about the nature of the copies that might be running, and to tailor the sensor definitions accordingly. For instance, if each httpd
process is run with a "-f /path-to/httpd.conf
" option, that effectively tells you which copy of Apache that process belongs to (so long as you don't have aliasing of multiple such copies across multiple containers running on your machine). So you could set up separate sensors, one per known configuration file, and each with cardinality = "first"
:
<service "Standard copy of Apache"> type = full_process_command cardinality = "first" # The path in this pattern is specific to my own deployments. # $MATCHED1$ will be "std" for this sensor. pattern = "/httpd(?=\s).*?\s-f\s+/etc/apache2-(std)/conf/httpd.conf(?:\s|$)" service = "apache-web-server" instance_suffix = "_$MATCHED1$" instance_ext_args = "$MATCHED1$!20!10" </service> <service "Alternate copy of Apache"> type = full_process_command cardinality = "first" # The path in this pattern is specific to my own deployments. # $MATCHED1$ will be "alt" for this sensor. pattern = "/httpd(?=\s).*?\s-f\s+/etc/apache2-(alt)/conf/httpd.conf(?:\s|$)" service = "apache-web-server" instance_suffix = "_$MATCHED1$" instance_ext_args = "$MATCHED1$!20!10" </service>
When the cardinality
of a <service>
sensor definition is set to multiple
, individual services generated by the service_profile
or service
will be turned into services with one or more instances, corresponding to the number of copies of the resource that are matched during auto-discovery. For this to work, you must have an instance_suffix
directive defined in this sensor definition, something like the following:
instance_suffix = "_$SANITIZED3$"
instance_ext_args
directive as well, to customize the service-externals arguments on a per-instance basis.
That particular instance_suffix
definition would use an underscore and the sanitized result of the third captured pattern-match element to form the service instance suffix as saved in Monarch for generating service instances. After a successful permanent configuration, you will be able to observe this setup in the service-check screen for this host-service in the Nagios configuration UI. The sanitization for each sensor definition is given by its sanitization
directive, which will apply uniformly across all captured pattern
-match strings to create macros of the form $SANITIZED#$
, where #
is a positive integer. This directive specifies the list of acceptable characters that will pass the sanitization testing; all other characters in each captured string will be dropped. A generally-safe definition is the following, which avoids using shell metacharacters in places they don't belong:
sanitization = "-.@_a-zA-Z0-9"
sanitization
value will probably suffice for most uses. If a backslash is to be allowed, it must be doubled in the sanitization
value. A #
character included in the value must also be preceded by a backslash. You are generally advised to stick with the basics.
Getting back to the cardinality of a <service>
sensor definition, when it is instead set to first
or single
, an instance_suffix
declaration is formally optional, but its use in practice depends on the intended context. For a <service>
sensor definition with cardinality first
or single
and no instance_suffix
directive, there will be only one copy of each of the base services referenced by the service_profile
or service
, and the name of each such service will be used unaltered, without any associated service instances. However, you can see that we did use instance_suffix
directives in the example above, because we wanted to generate two different service instances of the same base service (apache-web-server
). After full expansion as it appears in the monitoring, there will be no base service present in the configuration for this host; there will instead be either apache-web-server_std
or apache-web-server_alt
, or both.
Given what we just described, you can see it would be a disabling error to have a sensor definition with cardinality = "multiple"
but no instance_suffix
declared, since then there would be no means to generate individual services corresponding to the separate matched resources. Similarly, it would be a disabling error to have multiple sensor definitions with the same service_profile
or service
, cardinality
either first
or single
, and no instance_suffix
directive, that independently match separate copies of the resource. In this latter case, there would be no means to uniquely identify the different copies of the generated services.
Service Argument Processing and Customization
The purpose of auto-discovery is to:
- find out what resources need to be monitored;
- decide what services and service instances will be used to monitor those resources;
- provide information on how those services and service instances will be configured.
We have just described above the use of these directives to define the desired services and service instances:
cardinality pattern transliteration sanitization service_profile service instance_suffix
check_command command_arguments externals_arguments instance_suffix instance_cmd_args instance_ext_args
check_command
, if specified, is used to override the check from the associated service template.command_arguments
, if specified, ends up being combined with thecheck_command
(either explicit in the sensor definition, or from the generic service from which the host service is derived) to form the "Command line" field for the service.externals_arguments
, if specified, is used to override the externals arguments that are otherwie inherited from the generic service from which the host service is derived.instance_suffix
, if specified, is used both to force service instances to be used instead of the base service, and to provide the Instance Name Suffix for each configured instance of a service.instance_cmd_args
, if specified, provides the command-line arguments for each service instance, for commands run directly on the Nagios server. (It's possible that Auto-Setup may create active services in the configuration for the host, not just passive services whose checks will be executed on the GDMA client.) Also, even in the case of configuring a passive check, these arguments may be used as part of a freshness check if the GDMA client does not report monitoring results for some time.instance_ext_args
, if specified, is used to override the externals arguments otherwise inherited from the base service (those possibly being themselves inherited from the parent generic service) to provide externals arguments for each service instance.
With those ideas in mind, let's look at how those directives would be used in practice.
The main purpose of GDMA auto-discovery is to configure service checks that will be run directly on the GDMA client machine. Thus the check_command
, command_arguments
, and instance_cmd_args
directives, which control commands run directly on the GroundWork server, seem at first glance to be extraneous. However, they can be used to configure the freshness-check commands for the GDMA client services. Specifically, the command_arguments
directive defines a !
-separated set of values to be substituted for $ARG1$
, $ARG2$
, and similar macro references in the command definition provided in Monarch. Similarly, the instance_cmd_args
directive defines a !
-separated set of values to be substituted the same way, but for the particular service instances that are generated by the sensor definition that includes an instance_suffix
directive and the instance_cmd_args
directive.
In contrast, the externals_arguments
, instance_suffix
, and instance_ext_args
directives are used to configure service externals, which define how service checks are run directly on the GDMA client, not on the GroundWork server. Just below is the standard model for one set of service externals configured in this manner. In this example, a check_foo
plugin is called, and it is provided with whatever arguments it needs. Obviously, the particular Command plugin name and arguments will be specific to your own situation, but we're looking here at the overall use of macro substitutions to automatically customize these externals.
Check_$BASESERVICEDESC$[$INSTANCE$]_Enable="ON" Check_$BASESERVICEDESC$[$INSTANCE$]_Service="$SERVICEDESC$" Check_$BASESERVICEDESC$[$INSTANCE$]_Command="check_foo -t $INSTANCESUFFIX$ -w $ARG1$ -c $ARG2$" Check_$BASESERVICEDESC$[$INSTANCE$]_Check_Interval="1" Check_$BASESERVICEDESC$[$INSTANCE$]_Timeout="30"
$BASESERVICEDESC$
will be expanded to the name of the base service, without any indication of a service instance.- In contrast,
$SERVICEDESC$
will be expanded to the full final name of the service (including the instance name suffix, if service instances happen to be in play for this service). $INSTANCE$
will be expanded with a simple series of numbers, counting up from1
, the same for each set of related service externals (as in the group of externals directives shown just above) but incrementing for additional copies of the same service externals. This value is used by GDMA to group the multiple service-external definitions appropriately. It explains the utility of the fixed[1]
string that has been used for a long time in service externals, that people have long wondered about. Now there is the ability to dynamically determine that value. The$INSTANCE$
macro can be used even if only one copy of the service externals might be generated, say only for the base service and not for any service instances. In that case, it will expand to1
, as you would expect.$INSTANCESUFFIX$
will be expanded to just the instance name suffix, but with any leading underscore (if any) omitted.- Finally, macros like
$ARG1$
and$ARG2$
will be drawn from the base-service "Externals arguments" (if no service instances are in play for this service) or the per-service-instance "Externals arguments" (if service instances are in play for this service). You will need to define your own conventions for how the$ARG#$
macros are used in your externals definitions, and therefore what the semantics will be for each fixed argument position in theexternals_arguments
andinstance_ext_args
directives.
To support the sample externals shown, we would expect that some pattern matching will have been done on the resources as they were discovered, and an instance_suffix
directive perhaps like the following would be used to refer to some captured part of that pattern matching and use it as the service instance name suffix:
instance_suffix = "_$SANITIZED2$"
unit_340028
might be obtained as the second captured match value during the pattern matching, be directly accessible in sensor-directive values as the $MATCHED2$
macro if that were desired, be cleaned up by transforms defined by the transliteration
and sanitization
directives, and finally be available for use in sensor-directive values in its clean form as the value of the $SANITIZED2$
macro. Prepending an underscore to that string would make it suitable for appending to a base choo_choo
service, making the full service-instance name be choo_choo_unit_340028
. When externals are built, the instance-name suffix of _unit_340028
would have its leading underscore removed, and the string unit_340028
would be substituted for any reference to the $INSTANCESUFFIX$
macro in service externals for that particular service instance. Notice how transliteration
and sanitization
could be used in the general case both to make some complex pattern-matched string be suitable both for use as the service instance name suffix, and safe to use without any worries about interpretation of shell metacharacters when the $INSTANCESUFFIX$
value is used in the service-externals command line.
Let's be very clear about the usage of the externals_arguments
, instance_suffix
, and instance_ext_args
directives. For that, it's best to describe them in a different order.
- The
instance_suffix
directive must be specified in every matching sensor definition that creates a given service if more than one instance of that service may be applied to a single host, whether derived from one sensor definition or from multiple sensor definitions. If this directive is not specified in any matching sensor definition and more than one copy of the service is generated, then no other similar matching sensor definition can specify this directive, all the other details of the service configuration must also match, and only one copy of the base service will be created on the host. - The
instance_ext_args
directive will be used only if at least one explicit service instance (using some value of theinstance_suffix
directive) is created on the host. In this case, theinstance_ext_args
will define a!
-separated set of values to be substituted into$ARG1$
,$ARG2$
, and similar macro references in the service externals for each related service instance when externals are built on the server. These arguments will often reference some of the capturedpattern
-match elements from a successful sensor match, but they need not reference those elements in the same order as they were matched against the resource object, and additional text other than explicitly-matched characters may be provided in the individual argument fields. - The
externals_arguments
directive also provides values to be substituted into$ARG#$
macro references. This directive will be used in two cases.- If the base service is created on the host (that is, there are no service instances created), the
externals_arguments
will define a!
-separated set of values to be substituted into$ARG1$
,$ARG2$
, and similar macros in the service externals for the base service when externals are built on the server. - If service instances are created on the host using the
instance_suffix
directive, but noinstance_ext_args
directive is supplied in some sensor definition, the full value ofexternals_arguments
will simply be inherited by all the service instances created from that sensor definition. That would not prevent other matching sensor definitions that generate other service instances from having theinstance_ext_args
directive defined so as to override the externals arguments for the base service. But if multiple matching sensors defineexternals_arguments
with conflicting values, that would constitute an error even ifinstance_ext_args
ends up being used for some or all of the service instances. So if you did want some service instances to inherit the externals arguments from the base service and some to not inherit, all sensors that generate service instances that should inherit must have matching expanded values forexternals_arguments
and noinstance_ext_args
directives, and it would probably be best if all sensors that generate service instances that should not inherit did not haveexternals_arguments
arguments defined. That latter case would be interpreted as having no conflict, as opposed to forcing inheritance of the base-service externals from the generic service.
- If the base service is created on the host (that is, there are no service instances created), the
With all of that said, let's work out an example, so you can see these elements in play for customizing the generated service externals. Suppose we have this sensor definition:
<service "Train"> type = full_process_command cardinality = "multiple" pattern = "/train_controller\s+--train\s+(\S+)" # We use a standard sanitization pattern, not particular # to this specific sensor, even though we don't expect it # to change any characters in our matched string. This # is done for general security purposes. sanitization = "-.@_a-zA-Z0-9" service = "choo_choo" instance_suffix = "_train_$SANITIZED1$" # The arguments specified here are the train name, # warning threshold, and critical threshold. Ideally, # these are defined in much the same order as they are # used in the service externals that reference them. instance_ext_args = "$SANITIZED1$!20!10" </service>
check_command
, command_arguments
, and instance_cmd_args
directives, because we are satisfied with the default check_gdma_fresh
command that we will have specified as being inherited from the gdma
service template we are using for the choo_choo
service (see below). That check_gdma_fresh
command will simply generate a "Stale Status" warning state for the service, and it doesn't need to know anything about the specific service name or service-instance name it is being run for. (That identification will appear as part of the name of the stale service when the freshness check is run, so there is no reason to repeat it in the freshness-check output.)
Notice that for clarity, we use a comment to specify what each argument field means, since they are passed along and referenced as positional arguments, not individually named values. That small bit of documentation can avoid much confusion later on, when you need to use the correct $ARG#$
macro references in service externals, or when you want to compare what's in your discovery instructions with how the externals actually use those macros.
That sensor is supposed to match commands that look like this:
/path/to/train_controller --train unit_123456
choo_choo_train_unit_123456
.
Let us further suppose that the choo_choo
service has the choo-choo
service external attached, defined very generally as:
Check_$BASESERVICEDESC$[$INSTANCE$]_Enable="ON" Check_$BASESERVICEDESC$[$INSTANCE$]_Service="$SERVICEDESC$" Check_$BASESERVICEDESC$[$INSTANCE$]_Command="check_train -t $INSTANCESUFFIX$ -w $ARG2$ -c $ARG3$" Check_$BASESERVICEDESC$[$INSTANCE$]_Check_Interval="1" Check_$BASESERVICEDESC$[$INSTANCE$]_Timeout="30"
$ARG1$
in those service externals, inasmuch as we use $INSTANCESUFFIX$
instead to serve the same purpose and be a bit more descriptive. This construction does mirror what we specified as the instance_ext_args
value, referencing the $ARG2$
and $ARG3$
macros to access the other two values provided by the matched sensor.)
Finally, let's suppose that there are two copies of the command running on the GDMA client when auto-discovery is run:
./path/to/train_controller --train unit_246801 /path/to/train_controller --train unit_135790
- The train name (the value of the
--train
option on the command line) is captured during the pattern matching, via the parenthesized(\S+)
part of thepattern
. - General sanitization of the matched values occurs. In the present case, because the train names only use a simple set of safe characters, and there are no rogue processes running, this does not affect the captured values. But it's good to have this in place anyway as a security measure.
- Service instance suffixes
_train_unit_246801
and_train_unit_135790
are generated from theinstance_suffix
value. We added the wordtrain
in there just to show the most basic way in which strings can be manipulated for such purposes; macro references can be used in the middle of other strings, not just as standalone objects. - When the discovery results are processed on the server, the
choo_choo
service will be applied to this host. Since there are instance name suffixes provided in the auto-discovery results, there will be no basechoo_choo
service itself on this host. Instead, the host will ultimately havechoo_choo_train_unit_246801
andchoo_choo_train_unit_135790
services applied, implemented as service instances of the basechoo_choo
service. - Service instance
_train_unit_246801
will have instance-level externals arguments ofunit_246801!20!10
. - Service instance
_train_unit_135790
will have instance-level externals arguments ofunit_135790!20!10
. - After the discovery results are processed on the GroundWork server and permanently committed there, the setup noted above will be observable in the "Service Check" screen for the
choo_choo
service on this host. When externals are built for this GDMA client, the following service externals should be generated:
Check_choo_choo[1]_Enable="ON" Check_choo_choo[1]_Service="choo_choo_train_unit_135790" Check_choo_choo[1]_Command="check_train -t unit_135790 -w 20 -c 10 Check_choo_choo[1]_Check_Interval="1" Check_choo_choo[1]_Timeout="30" Check_choo_choo[2]_Enable="ON" Check_choo_choo[2]_Service="choo_choo_train_unit_246801 Check_choo_choo[2]_Command="check_train -t unit_246801 -w 20 -c 10 Check_choo_choo[2]_Check_Interval="1" Check_choo_choo[2]_Timeout="30"
Let's take one last look at that sensor definition. Since we know the expected form of a train name (unit_999999
) contains no suspicious characters, we could have skipped sanitization and used these definitions instead, referencing the raw pattern-matched strings:
instance_suffix = "_train_$MATCHED1$" instance_ext_args = "$MATCHED1$!20!10"
Some unavoidable string validation may be automatically enforced when instance_suffix
values are used on the server side to construct service names. If this validation fails, the overall Auto-Setup will fail. Such validation is necessary to reduce the risk of security problems arising when various punctuation characters might be interpreted as shell metacharacters and possibly introduce various code-injection vulnerabilities. Nonetheless, you should not depend on any such built-in validation; instead, apply sanitization yourself to guarantee that you are blocking potentially bad values. In general, resource elements that might be matched and passed along as part of the instance_suffix
or as part of command or externals arguments should be designed to be free of most punctuation. Because shells have evolved over time to be more and more functional, they have tended to use more and more punctuation characters for various purposes, which leaves the set of "safe" characters rather small. In general, if you restrict the matched arguments to ASCII alphanumerics plus just these characters:
- . @ _
transliteration
and sanitization
directives can help with cleaning up the strings that are matched by your sensor patterns. If you have any questions about this, ask GroundWork and we can provide some guidance as to why we believe other punctuation characters are potentially risky.
Foo Bar
- types, mechanisms, cardinality, etc.
- how to know the form of data your sensor pattern will be matched against
- extracting configuration data from discovery results
Related Resources
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page: