The Nagios system uses specific threads called check_workers to perform checks. If you do not have enough built-in dependencies in your configuration, you could flood the Nagios system with thousands of simultaneous check results. This could make for an overload, and when that happens, Nagios can literally cease to operate. In a container, there's a little bit larger chance of this than when you run Nagios on a host. That being said we have only encountered this overload in the lab, well beyond the theoretical limits of normal configurations.
The check_workers parameter is an adjustable one in Nagios, and GroundWork exposes all such parameters in the user interface. This article explains how to set up Nagios for a really large flood of simultaneous results. If you think your configuration has the potential to submit over 10,000 simultaneous results (for example a freshness threshold that is universally set to the same timeout on 10,000 services), then you may want to set this parameter manually.
Setting Miscellaneous Directives
- In the Configuration > Nagios Monitoring > Control screen, you can expand the left hand menu to show the Nagios main configuration menu. Click the last item, Miscellaneous Directives.
- In the resulting screen, enter check_workers in the Directive name field, and 11 in the Value field.
- Click Add Directive.
Click Save and Done. On the next commit, the directive will be active.
The value of 11 was arrived at by careful testing and experimentation. It's not good to set it too high, as there is a significant memory penalty. We consider 11 to be as high as one can reasonably go without adding to the minimum specification for a GroundWork server, and this level is stable when processing 10,000 simultaneous check results every 2 minutes, indefinitely. If the results are spread out even a little, then this issue is usually never encountered.