Syslog Monitoring Testing and Troubleshooting
Overview of testing and troubleshooting
As you develop and tune your regular expressions for syslog-message matching, it is inevitable that something or other will not work as expected. This page is a guide to discovering what went wrong and understanding what to do about it.
Advice on configuration
The first piece of advice is: do not stray far off the beaten path. There are many ways in which the setup can fail to work as intended if you tweak the rsyslogd
daemon options or the Telegraf syslog
input plugin options. As we say at the end of Best Practices:
Remember, the best way to recycle hindsight is to use it as foresight. Using it as mulch is way less effective.
GroundWork has already done the work of discovering how fragile the configuration can be. You should learn from our experience.
That said, what we describe in our setup is fairly simple use of rsyslogd
. It does not cover using TLS for the TCP connecton between the rsyslogd
daemon and the tsyslogd
container. Nor does it cover making that connection even more reliable (beyond using TCP instead of UDP) by configuring a queue at the sending end. Those topics and others are mentioned in the Advanced Setup section of the Syslog Monitoring Installation page, and may be covered in future versions of these pages. There may be other useful rsyslogd
options that we have not considered.
A process for troubleshooting
The strategy for testing the configuration involves:
- At each stage, factor out parts that are not directly relevant. Sympathy will not accrue to those who ignore that advice. For instance, to check that you are getting messages from a remote device or host, you might start with an initial end-to-end smoke test that involves the entire data path. But if that fails, take the
tsyslogd
container out of your test scenario, and just make sure that your localrsyslogd
daemon is able to receive messages from the remote system and write them to a logfile on the GroundWork server. If that doesn't work, you know where to concentrate your efforts, without extra complication. If that does work, ignore the remote system and just try to get your localrsyslogd
daemon to forward messages to thetsyslogd
container, without regard to how they get classified by your regular expressions. Once you have seen both of those parts working, check if you can see them working in tandem. And only at that point should you worry about the details of whether you have the right regular expressions in play for the remote machine, and other details of the classifier. - Use simple tools to generate test messages and send them into the system.
- Keep careful track of what has worked and what has not, and under what conditions. In a difficult troubleshooting session, you may find yourself testing many scenarios, and if you don't keep track of results along the way, you will end up both with some confusion, and with forgetfulness about all the test scaffolding and special settings you put in place along the way, that now needs to be backed out before final re-testing.
Debugging tools
In this section, we describe the parts and pieces you will need for testing, in roughly the order you will need them.
Sending custom test messages
It's near-impossible to test and troubleshoot without some means to submit specific messages at will. There are a few standard tools you can use to do this.
Using logger
logger
is the standard command-line tool for user scripts to send messages to syslog.
In theory, you might use the logger
command-line program to send some test messages directly to the tsyslogd
container. However, the capabilities of logger
have evolved over time, and even the logger
delivered in CentOS7 does not have all the facilities we would want to see for such testing. Therefore, the procedures we outline here are more indirect. First we check that we can use logger
to send messages to rsyslogd
and get them logged in a system file, so we know that connection is working. Then we verify that rsyslogd
is forwarding messages to the tsyslogd
container, as a second step. Once that is done, you will be able to use logger
to send whatever types of test messages you want, by using the rsyslog service as a relay.
There are multiple things you need to know about logger
and its use as a test command.
- It seems that if you do not supply the
logger -n $HOST
option, nothing is sent to the local syslog daemon. This will be highly confusing, because you think you are typing the right test command, but nothing happens. - The
logger --stderr
option appears to be broken. It prints the syslog message to the standard error stream if you do not supply thelogger -n $HOSTNAME
option, but in that situation nothing is sent to the syslog daemon. If you do use thelogger -n $HOSTNAME
option, nothing is printed to the standard error stream. There is no clear information as to why this option is not following the behavior documented in thelogger
man page. - If you try to send a message with a facility of
kern
, that will be silently converted to a facility ofuser
. This action is documented in thelogger
man page, but in casual testing, it is likely that you would bump into this long before you read the man page, and get mighty confused by it. For that reason, in the commands we present here, we always use a facility ofuser
, so there will be no question of such a hidden transform occurring. - In our own testing, the
logger -u /dev/log
option had no effect that we could discern. (That socket path is drawn from thersyslogd
man page.) Therefore, so as not to create any confusion as to what is happening, we do not ever attempt to run test messages into the system via that channel. Instead, we always use a UDP or TCP port. - The
logger -T
option defaults to using port 601/tcp (thesyslog-conn
service in/etc/services
), not 514/tcp. So if you want to send messages via TCP, you will want to either set up/etc/rsyslog.conf
to open TCP port 601 (as opposed to TCP port 514, which is mentioned instead in the default/etc/rsyslog.conf
file), or use thelogger -T -P 514
option. Because port 601/tcp is supposed to use the BEEP protocol, and rsyslog support of the BEEP protocol is not complete and not recommended for general use, we choose to use port 514/tcp even though it is not officially sanctioned for that usage. We show that port usage in our/etc/rsyslog.conf
setup instructions under Configuring the rsyslog service to receive syslog messages, and use an explicit port number on ourlogger
commands. If you don't like that choice, or if you already have some other port set up for rsyslog receipt of syslog messages over TCP on your GroundWork server, you can make the obvious adjustments. See Selecting port numbers for the tsyslogd container and SELinux constraints on port numbers for more information.
A typical test command sends a message with priority user.err
and gets it logged where we can see it. A logger
command to send such a message via TCP would be:
logger -n $HOSTNAME -T -P 514 -p user.err Testing testing 123
Using netcat
netcat
is a general-purpose tool for transferring data over sockets. As such, it can be used to send a message to an rsyslogd
daemon. The Telegraf syslog
input plugiin documentation suggests a couple of such commands. However, doing this correctly requires that you get very familiar with the details of all the fields in a properly-formatted syslog message, and that seems generally unproductive except in the most extreme troubleshooting situations. Therefore, we recommend using the logger
command instead, for sending messages.
A more-interesting use of netcat
is to see precisely what the rsyslog service is sending to the tsyslogd
container. Shut down the container, then run:
netcat -l 26514
using whatever port you have chosen for the tsyslogd
container to use externally. Essentially, netcat
takes on the role of the container. Then in a different terminal window, send some messages to the rsyslog service using logger
. If the rsyslog service is properly forwarding messages, you will see them in your terminal window as the output of the netcat
command.
Using socat
socat
can be considered to be an advanced netcat
. As such, if you're going to learn such a new tool, perhaps this is the one to choose.
An rsyslogd output file
Once you know how to send a message to the local rsyslogd
daemon on the GroundWork server, the first order of business is to prove that the daemon is actually receiving and processing it. A typical /etc/rsyslog.conf
config file will contain this setting:
# Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info;mail.none;authpriv.none;cron.none /var/log/messages
Note that said line may in some circumstances be somewhat misleading. In our own lab environment (admittedly a messy context), we have sometimes seen the /var/log/messages
file be rotated, but the rsyslogd
daemon continues to log into the rotated (and therefore renamed) logfile. It may be useful to run "ls -ltr /var/log/messages
" and see what file is actually receiving the most-recent messages. If it is not /var/log/messages
itself, you should be able to bounce the rsyslogd
daemon (service rsyslog restart
or systemctl restart rsyslog
) and get it logging again in the expected file. If this happens to you, you will probably want to track down the source of the difficulty and solve it on a more-permanent basis.
With that in play, along with the setup described under Configuring the rsyslog service to receive syslog messages to have rsyslogd
open up a local TCP port, you can use that standard setup for testing. Open two terminal windows on your GroundWork server. In one of them, run:
tail -f /var/log/messages
In the other terminal window, run the following logger
command. $HOSTNAME
is a shell or environment variable which is typically predefined for you in your terminal window.
logger -n $HOSTNAME -T -P 514 -p user.err This message should be received and logged.
You should see that message show up in the logfile. If it does not, check your rsyslogd
setup, and make sure you have bounced that daemon to pick up any changes.
Since you may be typing that command a lot while testing, here is a bash
alias for it.
alias lg="logger -n $HOSTNAME -T -P 514 -p user.err This message should be received and logged."
The tsyslogd container log
You can view the end of the log from the tsyslogd
container using the following command, executed from within your gw8
directory:
docker-compose logs --tail=5 -f tsyslogd
Leave that running in a termnal window while you execute tests in other terminal windows, to see what gets logged as the container operates. Since you may be typing that command a lot while testing, here is a bash alias for it.
alias dcl="docker-compose logs --tail=5 -f tsyslogd"
Log-control settings in the telegraf-syslog.conf configuration file
There are four key settings in the telegraf-syslog.conf
configuration file. We show them here in a fairly logical sequence, which is not the order in which they appear in the file. All of these items are set in the delivered config file to minimize the amount of logging from the tsyslogd
container. So to see anything interesting, you will need to alter these settings and bounce the container to pick up the chainges.
When interpreting these settings, it helps to have a mental picture of the sequence of plugins that the data will flow through inside Telegraf within the tsyslogd
container. Using the notation used for the respective configurations of thost plugins in the telegraf-syslog.conf
file, we have in order:
[[inputs.syslog]] [[processors.printer]] (a tap for logging at this stage, if configured; otherwise ignored) [[processors.classify]] [[outputs.file]] (a tap for logging at this stage, if configured; otherwise ignored) [[outputs.groundwork]]
The output of [[processors.classify]]
goes to both [[outputs.file]]
and [[outputs.groundwork]]
in parallel; there is no special sequencing between those two tail-end plugins.
The quiet option
[agent] quiet = true
The quiet
option determines whether Telegraf should log only error-level messages. This is not full control over all logging, but having it be set to true
does bring the noise down as much as possible given other settings. For debugging of the syslog
input plugin or the classify
processor plugin themselves, you will want to either comment this out or set it to false
. That said, this option does not control logging from either the processors.printer
or outputs.file
plugin, so if all you need is the summary lines at that level, you can leave the quiet
option set to true
.
The debug option
[agent] # debug = false
The debug option determines the level of log messages that will appear in the log from the plugins we care most about, primarily the classifier. To see all the available detail from the classifier in the container log, you will want to uncomment this option and set it to true
.
The processors.printer plugin
# [[processors.printer]]
The processors.printer
plugin is positioned in the chain of plugins immediately after the syslog
input plugin, by dint of its order
directive having a smaller value than the order
directive of the processors.classify
plugin. Setup for the processors.printer
plugin is all commented out as delivered. It is there in case you wish to log the output of the syslog
input plugin, showing successful receipt of syslog messages. Uncommenting the setup for this plugin allows you to see exactly when the syslog
input plugin received a message and was able to parse it and send it downstream. The logged output from this plugin will look similar to the following, all on one line but folded here manually for easier viewing.
syslog,appname=myusername,facility=user,hostname=myhost.com,severity=err timestamp=1658200282000000000i,message=" Testing testing 123", version=1i,facility_code=1i,severity_code=3i 1658200282111817490
That information is all logged in standard InfluxDB Line Protocol format (measurement name, tags, fields, timestamp).
The outputs.file plugin
# [[outputs.file]]
The outputs.file
plugin is positioned in the chain of plugins after the processors.classify
plugin, to allow you to easily see the classification results. Setup for the outputs.file
plugin is all commented out as delivered. It is there in case you wish to log the output of the classify
processor plugin, in compact form. Uncommenting the setup for this plugin will log output similar to the following, all on one line but folded here manually for easier viewing.
syslog,appname=myusername,facility=user,hostname=myhost.com,severity=err,status=SERVICE_UNKNOWN message=" Testing testing 123",version=1i,facility_code=1i,severity_code=3i, timestamp=1658202259000000000i 1658202259284942617
That information is all logged in standard InfluxDB Line Protocol format (measurement name, tags, fields, timestamp). Of particular note is the status
tag, which contains the classification result that will be used as the status of the syslog
service on the myhost.com
host.
What to test
It can help to have three terminal windows open side-by-side when looking at the tsyslogd
container log:
- one terminal in which to type in
logger
commands - one terminal in which to
tail /var/log/messages
to observe that the messages were seen by thersyslogd
daemon - one terminal in which to tail the
tsyslogd
container log to see what is going on inside the container as messages arrive
Check the open ports on your rsyslog service
To test to see whether rsyslogd
has the expected port open:
netstat -pantu | grep rsyslogd
You should see output similar to this:
% netstat -pantu | grep rsyslogd tcp 0 0 0.0.0.0:514 0.0.0.0:* LISTEN 27519/rsyslogd tcp6 0 0 :::514 :::* LISTEN 27519/rsyslogd
There are major security risks in having an rsyslog server open to the world on port 514. On CentOS, you can use rich firewall rules to limit incoming connections on port 514 from whitelisted IP ranges. Examples:
# firewall-cmd --permanent --add-port=514/tcp # firewall-cmd --permanent --add-port=514/udp # firewall-cmd --reload # firewall-cmd --permanent --add-rich-rule 'rule family="ipv4" source address="123.123.123.0/21" port port="514" protocol="tcp" accept' # firewall-cmd --permanent --add-rich-rule 'rule family="ipv4" source address="123.123.123.0/21" port port="514" protocol="udp" accept' # firewall-cmd --reload
That is just a tiny sample of what you might do for security, to at least make you aware of such concerns. For us here, that is a peripheral topic as regards troubleshooting syslog-message processing.
Test direct submission of messages from the GroundWork server
Use the following steps to test submission of messages via the local rsyslogd
daemon.
- Tail the
/var/log/messages
file, and run thelogger
command on the GroundWork server to send messages to thersyslogd
daemon. Verify that they got received and logged. - Adjust the configuration details in the
telegraf-syslog.conf
configuration file to give yourself adequate visibility into what is going on inside the container. - Bounce the
tsyslogd
container to pick up the changes. - Run the
logger
command to send messages to the rsyslog service. Verify that the messages got received and logged by thersyslogd
daemon. - Look at the container log to verify that the messages were received by the Telegraf
syslog
input plugin, and so would be passed downstream to theclassify
plugin. - From the container log data, check the classification and disposition of the messages.
- If the messages are not dropped by the classifier, you should be able to see the message data in the Status screen in GroundWork Monitor, under the
syslog
service on the host that is the GroundWork server itself.
Test submission of messages from a remote rsyslogd daemon
Use the following steps to test submission of messages from remote devices and hosts.
- Tail the
/var/log/messages
file on the GroundWork server so you can see evidence of incoming messages from remote hosts. - Run the
logger
command on the remote host, or otherwise trigger a device, to send log messages to the GroundWork server. Verify that the messages got received and logged by thersyslogd
daemon on the GroundWork server, separate from whatever happens thereafter inside thetsyslogd
container. - Follow the same steps as in the preceding procedure, giving yourself visibility into the inner workings of the
tsyslogd
container and verifying receipt and processing of messages there, as forwarded from thersyslogd
daemon on the GroundWork server.
What to do as a last resort
In extreme circumstances, it is possible to shut down the rsyslog service and run the rsyslogd
daemon manually, in debug mode:
service rsyslog stop /usr/sbin/rsyslogd -dn 2>&1 | tee /tmp/rsyslog-debug-log
See the following references if you get to the point where you think that might be necessary.
https://www.rsyslog.com/doc/master/troubleshooting/debug.html
https://www.rsyslog.com/doc/master/troubleshooting/debug.html#security-risks
You can interrupt with Ctrl-C when done, then restart the daemon:
service rsyslog start
Cleaning up after testing
Once you debug and fix whatever problem you faced, you will want to remove whatever test scaffolding you put in place. Such items may include:
- Turning off all extra in-container logging, including disabling any extra plugins you added while testing.
- Removing any extra logging you may have put into place from the rsyslog service.
- Possibly, disabling the syslog service port you used for debugging, if that will not be used in production.
- Reversing any firewall changes you made on a supposedly temporary basis.
- Reversing any SELinux port changes you made on a supposedly temporary basis.
- Restarting the rsyslog service
- Restarting the container.