Syslog Monitoring Testing and Troubleshooting
Overview of testing and troubleshooting
As you develop and tune your regular expressions for syslog-message matching, it is inevitable that something or other will not work as expected. This page is a guide to discovering what went wrong and understanding what to do about it.
Advice on configuration
The first piece of advice is: do not stray far off the beaten path. There are many ways in which the setup can fail to work as intended if you tweak the
rsyslogd daemon options or the Telegraf
syslog input plugin options. As we say at the end of Best Practices:
Remember, the best way to recycle hindsight is to use it as foresight. Using it as mulch is way less effective.
GroundWork has already done the work of discovering how fragile the configuration can be. You should learn from our experience.
That said, what we describe in our setup is fairly simple use of
rsyslogd. It does not cover using TLS for the TCP connecton between the
rsyslogd daemon and the
tsyslogd container. Nor does it cover making that connection even more reliable (beyond using TCP instead of UDP) by configuring a queue at the sending end. Those topics and others are mentioned in the Advanced Setup section of the Syslog Monitoring Installation page, and may be covered in future versions of these pages. There may be other useful
rsyslogd options that we have not considered.
A process for troubleshooting
The strategy for testing the configuration involves:
- At each stage, factor out parts that are not directly relevant. Sympathy will not accrue to those who ignore that advice. For instance, to check that you are getting messages from a remote device or host, you might start with an initial end-to-end smoke test that involves the entire data path. But if that fails, take the
tsyslogdcontainer out of your test scenario, and just make sure that your local
rsyslogddaemon is able to receive messages from the remote system and write them to a logfile on the GroundWork server. If that doesn't work, you know where to concentrate your efforts, without extra complication. If that does work, ignore the remote system and just try to get your local
rsyslogddaemon to forward messages to the
tsyslogdcontainer, without regard to how they get classified by your regular expressions. Once you have seen both of those parts working, check if you can see them working in tandem. And only at that point should you worry about the details of whether you have the right regular expressions in play for the remote machine, and other details of the classifier.
- Use simple tools to generate test messages and send them into the system.
- Keep careful track of what has worked and what has not, and under what conditions. In a difficult troubleshooting session, you may find yourself testing many scenarios, and if you don't keep track of results along the way, you will end up both with some confusion, and with forgetfulness about all the test scaffolding and special settings you put in place along the way, that now needs to be backed out before final re-testing.
In this section, we describe the parts and pieces you will need for testing, in roughly the order you will need them.
Sending custom test messages
It's near-impossible to test and troubleshoot without some means to submit specific messages at will. There are a few standard tools you can use to do this.
logger is the standard command-line tool for user scripts to send messages to syslog.
In theory, you might use the
logger command-line program to send some test messages directly to the
tsyslogd container. However, the capabilities of
logger have evolved over time, and even the
logger delivered in CentOS7 does not have all the facilities we would want to see for such testing. Therefore, the procedures we outline here are more indirect. First we check that we can use
logger to send messages to
rsyslogd and get them logged in a system file, so we know that connection is working. Then we verify that
rsyslogd is forwarding messages to the
tsyslogd container, as a second step. Once that is done, you will be able to use
logger to send whatever types of test messages you want, by using the rsyslog service as a relay.
There are multiple things you need to know about
logger and its use as a test command.
- It seems that if you do not supply the
logger -n $HOSToption, nothing is sent to the local syslog daemon. This will be highly confusing, because you think you are typing the right test command, but nothing happens.
logger --stderroption appears to be broken. It prints the syslog message to the standard error stream if you do not supply the
logger -n $HOSTNAMEoption, but in that situation nothing is sent to the syslog daemon. If you do use the
logger -n $HOSTNAMEoption, nothing is printed to the standard error stream. There is no clear information as to why this option is not following the behavior documented in the
- If you try to send a message with a facility of
kern, that will be silently converted to a facility of
user. This action is documented in the
loggerman page, but in casual testing, it is likely that you would bump into this long before you read the man page, and get mighty confused by it. For that reason, in the commands we present here, we always use a facility of
user, so there will be no question of such a hidden transform occurring.
- In our own testing, the
logger -u /dev/logoption had no effect that we could discern. (That socket path is drawn from the
rsyslogdman page.) Therefore, so as not to create any confusion as to what is happening, we do not ever attempt to run test messages into the system via that channel. Instead, we always use a UDP or TCP port.
logger -Toption defaults to using port 601/tcp (the
/etc/services), not 514/tcp. So if you want to send messages via TCP, you will want to either set up
/etc/rsyslog.confto open TCP port 601 (as opposed to TCP port 514, which is mentioned instead in the default
/etc/rsyslog.conffile), or use the
logger -T -P 514option. Because port 601/tcp is supposed to use the BEEP protocol, and rsyslog support of the BEEP protocol is not complete and not recommended for general use, we choose to use port 514/tcp even though it is not officially sanctioned for that usage. We show that port usage in our
/etc/rsyslog.confsetup instructions under Configuring the rsyslog service to receive syslog messages, and use an explicit port number on our
loggercommands. If you don't like that choice, or if you already have some other port set up for rsyslog receipt of syslog messages over TCP on your GroundWork server, you can make the obvious adjustments. See Selecting port numbers for the tsyslogd container and SELinux constraints on port numbers for more information.
A typical test command sends a message with priority
user.err and gets it logged where we can see it. A
logger command to send such a message via TCP would be:
logger -n $HOSTNAME -T -P 514 -p user.err Testing testing 123
netcat is a general-purpose tool for transferring data over sockets. As such, it can be used to send a message to an
rsyslogd daemon. The Telegraf
syslog input plugiin documentation suggests a couple of such commands. However, doing this correctly requires that you get very familiar with the details of all the fields in a properly-formatted syslog message, and that seems generally unproductive except in the most extreme troubleshooting situations. Therefore, we recommend using the
logger command instead, for sending messages.
A more-interesting use of
netcat is to see precisely what the rsyslog service is sending to the
tsyslogd container. Shut down the container, then run:
netcat -l 26514
using whatever port you have chosen for the
tsyslogd container to use externally. Essentially,
netcat takes on the role of the container. Then in a different terminal window, send some messages to the rsyslog service using
logger. If the rsyslog service is properly forwarding messages, you will see them in your terminal window as the output of the
socat can be considered to be an advanced
netcat. As such, if you're going to learn such a new tool, perhaps this is the one to choose.
An rsyslogd output file
Once you know how to send a message to the local
rsyslogd daemon on the GroundWork server, the first order of business is to prove that the daemon is actually receiving and processing it. A typical
/etc/rsyslog.conf config file will contain this setting:
# Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info;mail.none;authpriv.none;cron.none /var/log/messages
Note that said line may in some circumstances be somewhat misleading. In our own lab environment (admittedly a messy context), we have sometimes seen the
/var/log/messages file be rotated, but the
rsyslogd daemon continues to log into the rotated (and therefore renamed) logfile. It may be useful to run "
ls -ltr /var/log/messages" and see what file is actually receiving the most-recent messages. If it is not
/var/log/messages itself, you should be able to bounce the
rsyslogd daemon (
service rsyslog restart or
systemctl restart rsyslog) and get it logging again in the expected file. If this happens to you, you will probably want to track down the source of the difficulty and solve it on a more-permanent basis.
With that in play, along with the setup described under Configuring the rsyslog service to receive syslog messages to have
rsyslogd open up a local TCP port, you can use that standard setup for testing. Open two terminal windows on your GroundWork server. In one of them, run:
tail -f /var/log/messages
In the other terminal window, run the following
$HOSTNAME is a shell or environment variable which is typically predefined for you in your terminal window.
logger -n $HOSTNAME -T -P 514 -p user.err This message should be received and logged.
You should see that message show up in the logfile. If it does not, check your
rsyslogd setup, and make sure you have bounced that daemon to pick up any changes.
Since you may be typing that command a lot while testing, here is a
bash alias for it.
alias lg="logger -n $HOSTNAME -T -P 514 -p user.err This message should be received and logged."
The tsyslogd container log
You can view the end of the log from the
tsyslogd container using the following command, executed from within your
docker-compose logs --tail=5 -f tsyslogd
Leave that running in a termnal window while you execute tests in other terminal windows, to see what gets logged as the container operates. Since you may be typing that command a lot while testing, here is a bash alias for it.
alias dcl="docker-compose logs --tail=5 -f tsyslogd"
Log-control settings in the telegraf-syslog.conf configuration file
There are four key settings in the
telegraf-syslog.conf configuration file. We show them here in a fairly logical sequence, which is not the order in which they appear in the file. All of these items are set in the delivered config file to minimize the amount of logging from the
tsyslogd container. So to see anything interesting, you will need to alter these settings and bounce the container to pick up the chainges.
When interpreting these settings, it helps to have a mental picture of the sequence of plugins that the data will flow through inside Telegraf within the
tsyslogd container. Using the notation used for the respective configurations of thost plugins in the
telegraf-syslog.conf file, we have in order:
[[inputs.syslog]] [[processors.printer]] (a tap for logging at this stage, if configured; otherwise ignored) [[processors.classify]] [[outputs.file]] (a tap for logging at this stage, if configured; otherwise ignored) [[outputs.groundwork]]
The output of
[[processors.classify]] goes to both
[[outputs.groundwork]] in parallel; there is no special sequencing between those two tail-end plugins.
The quiet option
[agent] quiet = true
quiet option determines whether Telegraf should log only error-level messages. This is not full control over all logging, but having it be set to
true does bring the noise down as much as possible given other settings. For debugging of the
syslog input plugin or the
classify processor plugin themselves, you will want to either comment this out or set it to
false. That said, this option does not control logging from either the
outputs.file plugin, so if all you need is the summary lines at that level, you can leave the
quiet option set to
The debug option
[agent] # debug = false
The debug option determines the level of log messages that will appear in the log from the plugins we care most about, primarily the classifier. To see all the available detail from the classifier in the container log, you will want to uncomment this option and set it to
The processors.printer plugin
processors.printer plugin is positioned in the chain of plugins immediately after the
syslog input plugin, by dint of its
order directive having a smaller value than the
order directive of the
processors.classify plugin. Setup for the
processors.printer plugin is all commented out as delivered. It is there in case you wish to log the output of the
syslog input plugin, showing successful receipt of syslog messages. Uncommenting the setup for this plugin allows you to see exactly when the
syslog input plugin received a message and was able to parse it and send it downstream. The logged output from this plugin will look similar to the following, all on one line but folded here manually for easier viewing.
syslog,appname=myusername,facility=user,hostname=myhost.com,severity=err timestamp=1658200282000000000i,message=" Testing testing 123", version=1i,facility_code=1i,severity_code=3i 1658200282111817490
That information is all logged in standard InfluxDB Line Protocol format (measurement name, tags, fields, timestamp).
The outputs.file plugin
outputs.file plugin is positioned in the chain of plugins after the
processors.classify plugin, to allow you to easily see the classification results. Setup for the
outputs.file plugin is all commented out as delivered. It is there in case you wish to log the output of the
classify processor plugin, in compact form. Uncommenting the setup for this plugin will log output similar to the following, all on one line but folded here manually for easier viewing.
syslog,appname=myusername,facility=user,hostname=myhost.com,severity=err,status=SERVICE_UNKNOWN message=" Testing testing 123",version=1i,facility_code=1i,severity_code=3i, timestamp=1658202259000000000i 1658202259284942617
That information is all logged in standard InfluxDB Line Protocol format (measurement name, tags, fields, timestamp). Of particular note is the
status tag, which contains the classification result that will be used as the status of the
syslog service on the
What to test
It can help to have three terminal windows open side-by-side when looking at the
tsyslogd container log:
- one terminal in which to type in
- one terminal in which to
tail /var/log/messagesto observe that the messages were seen by the
- one terminal in which to tail the
tsyslogdcontainer log to see what is going on inside the container as messages arrive
Check the open ports on your rsyslog service
To test to see whether
rsyslogd has the expected port open:
netstat -pantu | grep rsyslogd
You should see output similar to this:
% netstat -pantu | grep rsyslogd tcp 0 0 0.0.0.0:514 0.0.0.0:* LISTEN 27519/rsyslogd tcp6 0 0 :::514 :::* LISTEN 27519/rsyslogd
There are major security risks in having an rsyslog server open to the world on port 514. On CentOS, you can use rich firewall rules to limit incoming connections on port 514 from whitelisted IP ranges. Examples:
# firewall-cmd --permanent --add-port=514/tcp # firewall-cmd --permanent --add-port=514/udp # firewall-cmd --reload # firewall-cmd --permanent --add-rich-rule 'rule family="ipv4" source address="18.104.22.168/21" port port="514" protocol="tcp" accept' # firewall-cmd --permanent --add-rich-rule 'rule family="ipv4" source address="22.214.171.124/21" port port="514" protocol="udp" accept' # firewall-cmd --reload
That is just a tiny sample of what you might do for security, to at least make you aware of such concerns. For us here, that is a peripheral topic as regards troubleshooting syslog-message processing.
Test direct submission of messages from the GroundWork server
Use the following steps to test submission of messages via the local
- Tail the
/var/log/messagesfile, and run the
loggercommand on the GroundWork server to send messages to the
rsyslogddaemon. Verify that they got received and logged.
- Adjust the configuration details in the
telegraf-syslog.confconfiguration file to give yourself adequate visibility into what is going on inside the container.
- Bounce the
tsyslogdcontainer to pick up the changes.
- Run the
loggercommand to send messages to the rsyslog service. Verify that the messages got received and logged by the
- Look at the container log to verify that the messages were received by the Telegraf
sysloginput plugin, and so would be passed downstream to the
- From the container log data, check the classification and disposition of the messages.
- If the messages are not dropped by the classifier, you should be able to see the message data in the Status screen in GroundWork Monitor, under the
syslogservice on the host that is the GroundWork server itself.
Test submission of messages from a remote rsyslogd daemon
Use the following steps to test submission of messages from remote devices and hosts.
- Tail the
/var/log/messagesfile on the GroundWork server so you can see evidence of incoming messages from remote hosts.
- Run the
loggercommand on the remote host, or otherwise trigger a device, to send log messages to the GroundWork server. Verify that the messages got received and logged by the
rsyslogddaemon on the GroundWork server, separate from whatever happens thereafter inside the
- Follow the same steps as in the preceding procedure, giving yourself visibility into the inner workings of the
tsyslogdcontainer and verifying receipt and processing of messages there, as forwarded from the
rsyslogddaemon on the GroundWork server.
What to do as a last resort
In extreme circumstances, it is possible to shut down the rsyslog service and run the
rsyslogd daemon manually, in debug mode:
service rsyslog stop /usr/sbin/rsyslogd -dn 2>&1 | tee /tmp/rsyslog-debug-log
See the following references if you get to the point where you think that might be necessary.
You can interrupt with Ctrl-C when done, then restart the daemon:
service rsyslog start
Cleaning up after testing
Once you debug and fix whatever problem you faced, you will want to remove whatever test scaffolding you put in place. Such items may include:
- Turning off all extra in-container logging, including disabling any extra plugins you added while testing.
- Removing any extra logging you may have put into place from the rsyslog service.
- Possibly, disabling the syslog service port you used for debugging, if that will not be used in production.
- Reversing any firewall changes you made on a supposedly temporary basis.
- Reversing any SELinux port changes you made on a supposedly temporary basis.
- Restarting the rsyslog service
- Restarting the container.