Why use a standby notification server?
If you have a requirement for redundancy that is not met by having a backup, you can use this option. Monitoring should generally be a continuous service, and should fail over automatically should the main server become unavailable. This is possible with GroundWork using the Standby Notification Server.
A second copy of the majority of your monitoring configuration is automatically synchronized to the standby server on the fly. Any changes you make to the Nagios Monitoring configuration and the notification manager are all synchronized. Downtimes are also synchronized.
A Standby server won't send notifications through the notification manager until and unless the Primary is no longer available or healthy. Once the Primary recovers, notifications from the Standby are once again suppressed.
Standalone or as Parent servers
You can use the Primary-Standby pair in standalone mode, that is, as standalone active check servers with double monitoring. This is also possible when using GDMA, as GDMA can have multiple target servers defined, and so report to both the Primary and Standby. Standby servers don't build externals automatically, so the GDMA systems will start complaining after a few days that they can't get new configurations without the Primary online.
The main way Primary and Standby servers are deployed, however, is as Parents of Parent-managed or Child-managed child servers. This guide covers setting up these deployment scenarios. You can, of course, mix these scenarios as needed to cover your particular deployment needs.
Not a HA solution
Standby Notification Server is not a full High Availability (HA) solution. Only specific tables of specific databases are replicated, and any active monitoring done by the Standby effectively duplicates that done by the Primary. There is no block-level synchronization of disks, redundant journaling, etc. Also, if you do fail over to standby operation, you can't make changes to the configuration until you recover the Primary server - it is not an "A/B" operation.
If your requirement is for a full HA solution, please contact email@example.com. We have such solutions for you, but they are not included with GroundWork and come at a premium cost.
You must have:
- Primary and Standby GroundWork hosts (servers)
- SSH (TCP/22) from Primary to Standby
- SSH (TCP/22) from Standby to Primary
- Postgres available on the Primary from the Standby (TCP/5432)
- You need to know the stable DNS name or IP address of both the Primary and Standby servers
- You probably want to set a non-default password for the replication role, so you should have one handy
- GroundWork UserID must be the same, with the /gw8 install in the home directory. GroundWork requires a UserID on the GroundWork 8 host, the ID doesn't matter, (we use user
gwos), but it has to be the same on each system, Primary and Standby.
- The SSH key exchange required needs to be for this user, so it must be possible to SSH into the server as the GroundWork (gwos) user.
- The GroundWork user's home must contain the GroundWork 8 installation (gw8) subdirectory.
- If you need to change these parameters, you can, but the automated scripted installation will need to be adjusted.
GroundWork Monitor Enterprise
- GroundWork Monitor 8.1.2 and above. This procedure will not work with older versions.
If you already have an existing GroundWork 8.1.2 or higher server installed as a parent or a standalone, you can use this setup procedure to add a Standby server to it, making it a Primary. Of course you can add new GroundWork servers as well, and set them up clean in Primary-Standby pairs.
We provide a scripted procedure to set up a new Standby server to pair with an existing Primary server, as long is it meets the requirements. This procedure makes a backup for you, so you can revert and try again if you make a mistake or hit an error.
Throughout these setup procedures, we will assume that you have both the Primary and Standby running, set up under the gwos user. In addition, we require the gw8 subdirectory be located immediately below the gwos user home e.g., /home/gwos/gw8. This procedure will not work if you have installed GroundWork under the root user, or if you located the gw8 directory elsewhere in the file system. You can contact support if you need to verify that your servers are set up correctly.
Your choice of operational mode (Standalone or Parent) isn't relevant until and unless you decide to add Child servers. You can always switch the mode later if you prefer not to decide now, but we recommend having a complete plan before going through this procedure.
To set up the Primary server, the easiest way is to download the zip file which contains all the required scripts, plugins, and configurations. This procedure requires an offline backup, and will also bounce the GroundWork server, so be prepared for an interruption in monitoring.
- Download (under SNS), the Primary file. Place it in the gwos user home directory, this must be the directory immediately above the gw8 directory, and the location you land when connecting by SSH.
Expand the file:
tar zxvf sns-primary.tar.gzCODE
Change the directory:
Execute the set-replication-primary.sh script, giving the DNS name or IP of the Standby server as the argument, and optionally the password for the replication role:
If you don't supply a password, it will generate one and show it to you so you can use it with the Standby. In either case, please note the password you use.
- When prompted, click Enter to continue.
When the process completes, copy the SSH key that appears on the screen to the Standby server in the gwos user's home .ssh directory, adding it to the authorized_keys file (if it exists), or creating this file if it does not. Make sure the file is restricted to the gwos user only, as are the other files in this directory.
If you notice any error messages, STOP. Restore from backup, and report the error to support. You can't run this process twice without manually deleting the replication publications and several other changes first. If you find you need to do this, we recommend you restore the backup you took when you started and start again. You can always study the set-replication-primary.sh shell script, or contact GroundWork Support if you find you need to adjust something.
Similarly (and only AFTER you configure the Primary server as above), you can set up the Standby server. Note this can be a clean install of GroundWork 8, as most of the important data will be replaced. The graphs, events, and report history collected by the server will be unique to it, but the configuration for Nagios Monitoring, SLA reporting, downtimes, and notifications are all replaced by those from the Primary.
- Download (under SNS), the Standby file. Place it in the gwos user home directory.
Expand the file:
tar zxvf sns-standby.tar.gzCODE
Change the directory (depending on your location):
Execute the set-replication-standby.sh script, similarly, with the Primary server DNS name or IP address, and the password you used above:
The password must match that used on the primary, so do not proceed until you are sure you have it.
When the process completes, copy the SSH key that appears on the screen to the Primary server in the gwos user's home .ssh directory, adding it to the authorized_keys file (if it exists), or creating this file if it does not. Make sure the file is restricted to the gwos user only, as are the other files in this directory.
If you notice any error messages, STOP. Restore from backup, and report the error to support.
At this point, the Primary and Standby servers are linked, and any changes you make to Configuration > Nagios Monitoring or Configuration > Notifications are made on both servers. You will also need to set up the notification monitoring and management to avoid getting duplicate notifications, however. To do so, you will manually configure cross-monitoring as follows:
On the Primary server, access the Configuration > Nagios Monitoring > Control > Nagios Resource Macros section, and change the value of $USER18$ to the username of the GroundWork user (gwos, in our case). Update the macro value.
If you are already using $USER18$ for something else, use any other unused macro, but note it for adjusting this parameter later.
- Navigate to Nagios Monitoring > Profiles > Profile Importer > Import. Select Uploaded and import the profiles for Primary and Standby hosts and services (4 profiles in all).
- Create a new hostgroup for the pair (optional, but a good idea).
- Add the two servers using Configuration > Nagios Monitoring > Hosts > Host wizard. Apply the Primary and Standby host profiles to the respective hosts.
- Define a service dependency that keeps the dependent service from executing when the primary_health service is in an Unknown, Critical, or Warning state. We recommend adding a descriptive name like PrimaryNotWorking-suppress-turning-noma-off.
- Define a service dependency that keeps the dependent service from executing when the primary_health service is in an OK state. We recommend the title PrimaryWorking-suppress-turning-noma-on.
- Apply the dependencies to the noma_off and noma_on services on the Standby host you added, as shown.
- Commit the configuration. You will see a message at the bottom of the commit panel indicating commitscript.sh has run on the Standby, which commits the configuration automatically.
Procedure: Connecting Child servers to Primary and Standby
This optional section is relevant if you use Child servers (either Parent-managed child or Child-managed child). If you are just using the Standby and Primary as a pair, you can skip this section.
When using child servers, both the Primary and Standby will be in Parent mode, and you will have at least one Child server operating in Parent-managed or Child-managed mode (or more than one of each mode).
Parent-managed child servers
In this case, you will already have configured a Parent managed child server to connect to the Primary as described in the Deploying Parent Child documentation. To connect each Parent-managed child to the Standby as well:
- Add the credentialed user to the Standby under Administration > Users, just as is described in the Deploying Parent Child link above..
- Access the Primary server and browse to the Configuration > Connectors menu option.
- Click on the existing child server connector:
- Click on the GroundWork Connections tab, and click New Child Connection:
- Add an entry for the Standby server like this, the standbyhost name is the instance name of the Standby parent server:
Child-managed child servers
- To connect a Child-managed child server to the Standby, first add the credentialed user to the Standby under Administration > Users.
- Go to the Configuration > Connectors menu option.
- Click the Local Nagios connection to open the details page:
- Click the GroundWork Connections tab:
You probably already have a parent connection to the Primary, since this is already a Child-managed-child server, and you already set it up according to the documentation. To add a new connection to the Standby parent, just click Connect to Parent, and replace the default name of the parent with the name of the Standby:That's it. The Nagios monitored inventory on the Child will now post results to both the Primary and Standby parents.
Exceptions and Future Enhancements
The Standby notification server is limited in its role. It is not an exact copy of the Primary. There are many reasons to think of it as one, but there are important differences and exceptions. These are listed here.
Note that most of these exceptions are mitigated in the case of using child servers that do most of the monitoring, and that report results (over TCG) to both Primary and Standby parents.
Here's what's not duplicated:
- Cloud Hub: Cloud Hub configurations are not copied to the Standby server. As Cloud Hub is generally very easy to configure, however, this is not much of an exception. Future versions of the Standby notification server may include Cloud Hub connector configuration copies and associated double monitoring.
- Network Discovery: NeDi data is not synchronized to the Standby notification server. This includes network data capture containers. This may be added in a future version as well.
- Connectors: Other Transit Connection Generator (TCG) connectors, such as the Elastic connector, are not replicated to the Standby. As these are typically deployed independent of GroundWork Monitor itself, this is by design.
- GDMA externals, auto-setup instructions: If you are using GDMA without Child servers, you can set the reporting of results to go to two (or more) target servers by listing them as a comma-delimited list in the Target Server directive of the GDMA agent, typically in the host external. See the GDMA documentation for more information. However, GDMA will only look for new configuration files at the first member of this list, typically the Primary server. This means that should the Primary server fail, GDMA will not accept new changes to its configuration. This is usually not an issue. The GDMA will continue operating for the time defined in the time value (ref), which is 3 days by default. This is the time you have in which to recover the Primary server.
You can also configure GDMA to connect and report to Child servers, which removes the requirement to list multiple targets. However, in this case the target proxies the configuration file request to the Parent, and this is also set to the Primary, so you still need to recover it within the time allotted.
Auto-setup instructions and triggers are also treated in the same way as externals as far as GDMA is concerned. They are installed on the Primary, and placing them on the Standby will have no effect.
- Nagios Notifications: In some cases using the notification manager for notifications is not sufficiently robust, or you may have a significant investment in the configuration of Nagios for notification that you wish to continue to use. As of version 1.0.0, this standby server doesn't switch enabling and disabling notification via Nagios. This is planned for a future version.