Release Notes for GDMA 2.7.1

REDIRECT: This URL has changed to https://support8.gwos.com/gw/gw8/latest/documentation/configuration/gdma-monitoring/about-gdma/gdma-release-notes/gdma-2-7-1-release-notes
Click the link above if you are not automatically redirected in 10 seconds.

CONTENTS

RELATED RESOURCES

Changes in this release

Because of the Logging fixes, all users of Windows GDMA are advised to upgrade to the GDMA 2.7.1 release.

GDMA 2.7.1 provides the following new features and updates:

Logging

  • On the Windows platform, automatic logfile rotation has not been limiting the growth of the spooler logfile if GDMA remains up for a long period (GDMA-256). Also, because the Windows filesystem often does not update externally-visible file metadata while a file is being written, the fact that the spooler logfile has grown huge could be hidden until the spooler happens to be stopped. These facts conspire to make Windows GDMA releases between 2.6.0 and 2.7.0, inclusive, and in particular GDMA 2.7.0, a potential source of trouble with respect to infinite logfile growth. This situation can only be ameliorated in those releases by setting the Enable_Local_Logging option in the gdma_auto.conf file on the GDMA client to "off". That was already the default before 2.7.0, but it was changed in 2.7.0. (Trying to make this change in host externals on the server will not work. In fact, the reason this typically hits the Windows platforms and not the others is that our standard Windows host externals do disable logging. In a strange twist of fate, that is part of the conditions that drive the infinite log growth, in combination with logging being enabled in the gdma_auto.conf file. And not setting the logging in the host externals for Windows GDMA 2.6.0 through 2.7.0 when logging is enabled in gdma_auto.conf, as it is in 2.7.0, or enabling logging in the host externals for any of those releases, ends up triggering a separate Windows-specific logging bug, described in the next item. So that is not an option.) The bad behavior has been fixed in GDMA 2.7.1.
  • In Windows GDMA 2.6.0 or later, enabling local logging in the host externals, or in Windows GDMA 2.7.0 simply not explicitly turning it off in the host externals, would eventually cause the GDMA daemons to deadlock and no longer run service checks (GDMA-256). This has been fixed in GDMA 2.7.1. Enabling logging is now safe on this platform, and recommended in order to locally save the results of service checks for forensic analysis when things go wrong.

Plugins

  • The check_oracle_db plugin has been overhauled (GWMON-9340, GDMA-424). Non-Windows GDMA platforms have a revised copy of the check_oracle_db script installed, along with a slightly modified copy of its sidekick check_oracle_by_jdbc script. These changes address a number of corner cases in which the plugin was not producing correct output. Also, the -n option has been revised to support more reliable operation (see below).
    • check_oracle_db better recognizes errors in connecting to the database, instead of falsely reporting an OK status without any performance data.
    • check_oracle_db no longer generates a misleading OK result when in fact the underlying SQL query produced no result rows.
    • Additional types of query-execution exceptions are now correctly sensed as failures.
    • Command-line option flags are now parsed correctly. This prevents argument-position errors, such as swapping the order of the warning and critical thresholds, which would have gone unnoticed by the previous version of the plugin. (Correct ordering of the plugin options is no longer presumed. With this new version and the correct option flags, options may now be specified in any order.)
    • Much better debug output is now provided, under control of the new -d option. This can assist in manual-run situations where there is some mystery as to what the script is seeing internally as results from the underlying SQL query.
    • Most importantly, the -n option value can now be the heading of the SQL query result column whose values you wish to compare against the warning and critical thresholds. This is far more reliable than using a column number, partly because the column numbering starts with 0 (which is probably unexpected, and was formerly undocumented), and because there was no obvious way to check that you had actually selected the particular column you were interested in. Naming the column heading provides automatic validation that you are in fact checking the values in the column you wanted to check, so use of this new feature is highly recommended. As of the availability of this version of the plugin, using a column number is therefore deprecated, in favor of the heading.
    • A proper usage message is now available. It can be spilled out by running the plugin without any arguments. It is quite extensive, and documents in detail the requirements for the SQL query to be run.
    • A bunch of other internal improvements have been made as well, such as emitting more descriptive error messages.
  • The check_cpu_load_percentage.vbs plugin in Windows GDMA has been overhauled (GDMA-380). A variety of internal improvements have been made. The key externally-visible change is that optional "-r retries" and "-i retry_interval" options have been added, to provide for internal retries if pulling WMI data is sometimes erratic. You can run the script as:

    cscript/nologo gdma\libexec\v2\check_cpu_load_percentage.vbs

    to see the complete usage message.

  • The check_system_uptime.pl plugin has been upgraded to produce perfdata (an Uptime item, with its value being seconds since boot) (GDMA-278). Developing a sensible graph for this data is a task still outstanding. (A simple time-since-boot that grows linearly forever would not be terribly interesting, and might have difficulty in scaling the y-axis appropriately, though it would show a reset-to-zero on every reboot. However, the current time-viewport of such a graph might not include any reboots. Best would be if we showed such data on a logarithmic y-axis, and when a critical or warning state is present, there should be a red or yellow area stretching for the entire vertical height of the graph to very visibly mark the reboot event.)
  • Service-externals commands now support the use of double-quote characters within the command definition (GDMA-302). Since the command as a whole is enclosed in double-quote characters, and since we don't want to adopt backslash-escaping lest that be confused with Windows pathnames, we follow the simple convention that you can use two consecutive double-quote characters to produce an individual double-quote character in the command to be executed. Thus for example, one might use:

    Check_my_service[1]_Command="$Plugin_Directory$/check_dummy 2 'a ""word"" for you'"

    to specify that the command to be run on a Linux GDMA client would be:

    /usr/local/groundwork/nagios/libexec/check_dummy 2 'a "word" for you'
  • Plugins that produce multi-line output including performance data had their results corrupted, resulting in both truncated status text and bad perfdata (GDMA-447). This has been addressed by converting the plugin output to a single-line format, with original lines separated by "/" characters. This at least makes all the status text and perfdata available and correct. However, in this format, all of the status text will appear as the short plugin output, and there will be no separate long plugin output field when Nagios parses the check result. This situation may be further improved in future versions of GDMA.

Polling

  • Timeouts and other system-level failures when running service checks were not being reflected in the state of the poller service, and this could cause some unwanted flapping of that service (GDMA-457). This has now been corrected, so timeouts at the service-check level will be percolated up as a Warning status for the poller service, and more-serious system failures at the service-check level will be percolated up as a Critical status for the poller service.
  • The operation of multihost mode (GDMA_Multihost = "on") on the Windows platform has been improved so there is less self-imposed dead time between monitoring of successive hosts once the configured Max_Concurrent_Hosts limit has been reached (GDMA-458). Formerly, if the GDMA client monitored a very large number of other hosts, that could stretch out the overall polling-cycle time beyond the configured Poller_Proc_Interval, for no real benefit. Also on Windows, associated code changes now also prevent the temporary buildup of an excessive number of zombie processes in each polling cycle. UNIX-like platforms are not affected by these changes, because they did not suffer those defects in the first place.

TLS certificate handling

  • In GDMA 2.7.0, even on non-Windows platforms, automatic TLS certificate downloading was not working correctly in all cases (GDMA-455). We have substantially expanded our test coverage and improved the code so intermediate certs are now fully supported. Furthermore, the set of root CA certificates we validate against has been updated in GDMA 2.7.1, to a very recent copy of what Mozilla recognizes in the Firefox browser. As a result, use of automatic cert downloading in GDMA 2.7.0 is now deprecated in favor of using GDMA 2.7.1.
  • GDMA 2.7.1 will dynamically download and use certs automatically on each HTTPS connection, just like web browsers do. There is no longer a "--gdma_download_certs" installer option or a Download_Certs_Automatically config-file option to control that behavior. There is instead a "--gdma_log_questionable_certs" installer command-line option and a corresponding Log_Questionable_Certs config-file option. If downloaded certs do not validate, this new option controls whether those certs will be subject to further analysis to determine the nature of the failure, and logging of that information to assist in correcting certificate problems. This option defaults as enabled, so the extra help it provides is not normally hidden behind an option you might not know about. Also, having the Log_Questionable_Certs option enabled is necessary for the installer "--gdma_download_self_signed_certs" option (the Allow_Downloaded_Self_Signed_Certs config file option) to work. Using self-signed certs is generally discouraged, and this dependency forces the logging of information about the potentially insecure acceptance of such certs.
  • The Allow_Downloaded_Self_Signed_Certs config file option still defaults to being enabled in this release. That setting is done mostly for convenience in setting up proof-of-concept first installs of GDMA. If you wish to run a secure installation, you must disable automated acceptance of downloaded self-signed certs at install time in the installer UI, via selection in the installer text mode, or via the "--gdma_download_self_signed_certs" installer command-line option (or equivalent option-file option) in an unattended mode install, or disable the config-file option in the client-side gdma_auto.conf file after installation is complete.
  • The prefixing of "INSECURE:" to general service-check results when they are being sent via HTTPS using a downloaded self-signed cert has been removed. Instead, this prefixing is now restricted to just the internally-generated service-check result for the Poller_Service (e.g., typically gdma_poller), and even then it is now controllable at GDMA client install time (GDMA-455). This control is provided by the installer "--gdma_flag_self_signed_certs" option and the corresponding Flag_Downloaded_Self_Signed_Certs config-file option. This option defaults to being enabled; disabling it requires an explicit action on the part of the customer, who thereby assumes all risk when operating in such a mode. In GDMA 2.7.1, the prefixing controlled by this option will last only until the GDMA poller is bounced, so it should not be depended upon for an ongoing report of insecure operation. Using self-signed certs is still strongly discouraged because that mode of operation opens security holes, but they can be useful for demo purposes.
  • We have removed the config-file option to allow downloaded TLS certificates with bad server names (Allow_Downloaded_Bad_Server_Name_Certs), and its associated installer command-line option ("--gdma_download_bad_server_name_certs") (GDMA-455). That option made the GDMA clients inherently insecure, and our support for TLS certs has improved both in the GDMA client and in GW 8.1.0 so that said option is no longer useful anyway.

Component updates

  • On non-Windows platforms, Perl has been upgraded to version 5.28.3. A select few Perl add-on packages have also been updated.
  • OpenSSL has been upgraded to version 1.1.1g.
  • On non-Windows platforms, Nagios Plugins have been upgraded to version 2.3.3.
  • OpenLDAP has been upgraded to version 2.4.50.

Miscellaneous

  • Error handling when spooling service-check results for later transmission to the server has been fixed (GDMA-451).
  • A benign but confusing warning message has been eliminated (GDMA-453). In the earlier GDMA 2.7.0 release, the message appears when GDMA is configured for sending in check results via HTTP/S, and the GDMA spooler is started in interactive mode for diagnostic purposes.

Known issues

  • Testing showed that the Windows GDMA build had problems accessing the GroundWork server using IPv6. Therefore, the Windows GDMA 2.7.1 release has been explicitly limited to having the poller and spooler contact the GroundWork server using IPv4. This issue will be revisited in a future release. Non-Windows GDMA builds are not known to have this limitation.

System requirements

  • For Windows GDMA 2.7.1:
    • The installer size is 82MB
    • The deployed software takes 280MB
  • For Linux 32-bit GDMA 2.7.1:
    • The installer size is 36MB
    • The deployed software takes 200MB
  • For Linux 64-bit GDMA 2.7.1:
    • The installer size is 39MB
    • The deployed software takes 210MB
  • For any platform:
    • Provision 80MB for logfiles, (this size is limited now by automatic log rotation and the retention of only one older version of each rotated logfile).
    • Provision space for any downloaded or otherwise-installed plugins you may add after the initial deployment.

Version compatibility

  • If plain HTTP is in use for client/server communication, GDMA 2.7.1 should be compatible with any release of GWMEE starting with 7.1.0, and possibly (though untested) going back to GWMEE 6.7.0. Note that plain HTTP has been deprecated for some time now.
  • If HTTPS is in use for client/server communication, GDMA 2.7.1 should be compatible with any release of GWMEE starting with 7.1.1. The main driver of compatibility is the forced use of TLS 1.2 in GDMA 2.7.1, so the connection does not fall back into older, now-insecure and deprecated TLS and SSL protocols.
  • GDMA Auto-Setup features require GDMA 2.6.1 or later, and GWMEE 7.2.0 or later. (GDMA Auto-Setup on GWMEE 7.2.0 requires a server-side tarball overlay to provide files which are built into GWMEE 7.2.1 and later releases.)
  • Sending check results from GDMA to the GroundWork Monitor server via HTTP/S instead of the NSCA transport requires GDMA 2.7.0 or later, along with GWMEE 7.2.1 (with some special modification) or a GW8.X.X release on the server side.

More information

For more extensive coverage of the features in GDMA 2.7.1, see also the Release Notes for GDMA 2.7.0. For information on installing and other specific aspects of interest, see How to install GDMA.

Installation notes when using HTTPS with GDMA

Any installation of GroundWork Monitor that uses HTTPS, which is highly recommended, has to deal with TLS certificates and maintenance. The recommended approach is to obtain TLS certs from a public or private Certificate Authority, depending on your particular situation. It is important to read the section titled Obtaining and installing TLS certificates with options in the document How to use GDMA with HTTPS, and pay particular attention to option settings needed at install time.