Overview of testing and troubleshooting

As you develop and tune your regular expressions for syslog-message matching, it is inevitable that something or other will not work as expected.  This page is a guide to discovering what went wrong and understanding what to do about it.

Advice on configuration

The first piece of advice is:  do not stray far off the beaten path.  There are many ways in which the setup can fail to work as intended if you tweak the rsyslogd daemon options or the Telegraf syslog input plugin options.  As we say at the end of Best Practices:

Remember, the best way to recycle hindsight is to use it as foresight. Using it as mulch is way less effective.

GroundWork has already done the work of discovering how fragile the configuration can be.  You should learn from our experience.

That said, what we describe in our setup is fairly simple use of rsyslogd.  It does not cover using TLS for the TCP connecton between the rsyslogd daemon and the tsyslogd container.  Nor does it cover making that connection even more reliable (beyond using TCP instead of UDP) by configuring a queue at the sending end.  Those topics and others are mentioned in the Advanced Setup section of the Syslog Monitoring Installation page, and may be covered in future versions of these pages.  There may be other useful rsyslogd options that we have not considered.

A process for troubleshooting

The strategy for testing the configuration involves:

  • At each stage, factor out parts that are not directly relevant.  Sympathy will not accrue to those who ignore that advice.  For instance, to check that you are getting messages from a remote device or host, you might start with an initial end-to-end smoke test that involves the entire data path.  But if that fails, take the tsyslogd container out of your test scenario, and just make sure that your local rsyslogd daemon is able to receive messages from the remote system and write them to a logfile on the GroundWork server.  If that doesn't work, you know where to concentrate your efforts, without extra complication.  If that does work, ignore the remote system and just try to get your local rsyslogd daemon to forward messages to the tsyslogd container, without regard to how they get classified by your regular expressions.  Once you have seen both of those parts working, check if you can see them working in tandem.  And only at that point should you worry about the details of whether you have the right regular expressions in play for the remote machine, and other details of the classifier.
  • Use simple tools to generate test messages and send them into the system.
  • Keep careful track of what has worked and what has not, and under what conditions.  In a difficult troubleshooting session, you may find yourself testing many scenarios, and if you don't keep track of results along the way, you will end up both with some confusion, and with forgetfulness about all the test scaffolding and special settings you put in place along the way, that now needs to be backed out before final re-testing.

Debugging tools

In this section, we describe the parts and pieces you will need for testing, in roughly the order you will need them.

Sending custom test messages

It's near-impossible to test and troubleshoot without some means to submit specific messages at will.  There are a few standard tools you can use to do this.

Using logger

logger is the standard command-line tool for user scripts to send messages to syslog.

In theory, you might use the logger command-line program to send some test messages directly to the tsyslogd container.  However, the capabilities of logger have evolved over time, and even the logger delivered in CentOS7 does not have all the facilities we would want to see for such testing.  Therefore, the procedures we outline here are more indirect.  First we check that we can use logger to send messages to rsyslogd and get them logged in a system file, so we know that connection is working.  Then we verify that rsyslogd is forwarding messages to the tsyslogd container, as a second step.  Once that is done, you will be able to use logger to send whatever types of test messages you want, by using the rsyslog service as a relay.

There are multiple things you need to know about logger and its use as a test command.

  • It seems that if you do not supply the logger -n $HOST option, nothing is sent to the local syslog daemon.  This will be highly confusing, because you think you are typing the right test command, but nothing happens.
  • The logger --stderr option appears to be broken.  It prints the syslog message to the standard error stream if you do not supply the logger -n $HOSTNAME option, but in that situation nothing is sent to the syslog daemon.  If you do use the logger -n $HOSTNAME option, nothing is printed to the standard error stream.  There is no clear information as to why this option is not following the behavior documented in the logger man page.
  • If you try to send a message with a facility of kern, that will be silently converted to a facility of user.  This action is documented in the logger man page, but in casual testing, it is likely that you would bump into this long before you read the man page, and get mighty confused by it.  For that reason, in the commands we present here, we always use a facility of user, so there will be no question of such a hidden transform occurring.
  • In our own testing, the logger -u /dev/log option had no effect that we could discern.  (That socket path is drawn from the rsyslogd man page.)  Therefore, so as not to create any confusion as to what is happening, we do not ever attempt to run test messages into the system via that channel.  Instead, we always use a UDP or TCP port.
  • The logger -T option defaults to using port 601/tcp (the syslog-conn service in /etc/services), not 514/tcp.  So if you want to send messages via TCP, you will want to either set up /etc/rsyslog.conf to open TCP port 601 (as opposed to TCP port 514, which is mentioned instead in the default /etc/rsyslog.conf file), or use the logger -T -P 514 option.  Because port 601/tcp is supposed to use the BEEP protocol, and rsyslog support of the BEEP protocol is not complete and not recommended for general use, we choose to use port 514/tcp even though it is not officially sanctioned for that usage.  We show that port usage in our /etc/rsyslog.conf setup instructions under Configuring the rsyslog service to receive syslog messages, and use an explicit port number on our logger commands.  If you don't like that choice, or if you already have some other port set up for rsyslog receipt of syslog messages over TCP on your GroundWork server, you can make the obvious adjustments.  See Selecting port numbers for the tsyslogd container and SELinux constraints on port numbers for more information.

A typical test command sends a message with priority user.err and gets it logged where we can see it.  A logger command to send such a message via TCP would be:

logger -n $HOSTNAME -T -P 514 -p user.err Testing testing 123

Using netcat

netcat is a general-purpose tool for transferring data over sockets.  As such, it can be used to send a message to an rsyslogd daemon.  The Telegraf syslog input plugiin documentation suggests a couple of such commands.  However, doing this correctly requires that you get very familiar with the details of all the fields in a properly-formatted syslog message, and that seems generally unproductive except in the most extreme troubleshooting situations.  Therefore, we recommend using the logger command instead, for sending messages.

A more-interesting use of netcat is to see precisely what the rsyslog service is sending to the tsyslogd container.  Shut down the container, then run:

netcat -l 26514

using whatever port you have chosen for the tsyslogd container to use externally.  Essentially, netcat takes on the role of the container.  Then in a different terminal window, send some messages to the rsyslog service using logger.  If the rsyslog service is properly forwarding messages, you will see them in your terminal window as the output of the netcat command.

Using socat

socat can be considered to be an advanced netcat.  As such, if you're going to learn such a new tool, perhaps this is the one to choose.

An rsyslogd output file

Once you know how to send a message to the local rsyslogd daemon on the GroundWork server, the first order of business is to prove that the daemon is actually receiving and processing it.  A typical /etc/rsyslog.conf config file will contain this setting:

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                /var/log/messages

Note that said line may in some circumstances be somewhat misleading.  In our own lab environment (admittedly a messy context), we have sometimes seen the /var/log/messages file be rotated, but the rsyslogd daemon continues to log into the rotated (and therefore renamed) logfile.  It may be useful to run "ls -ltr /var/log/messages" and see what file is actually receiving the most-recent messages.  If it is not /var/log/messages itself, you should be able to bounce the rsyslogd daemon (service rsyslog restart or systemctl restart rsyslog) and get it logging again in the expected file.  If this happens to you, you will probably want to track down the source of the difficulty and solve it on a more-permanent basis.

With that in play, along with the setup described under Configuring the rsyslog service to receive syslog messages to have rsyslogd open up a local TCP port, you can use that standard setup for testing.  Open two terminal windows on your GroundWork server.  In one of them, run:

tail -f /var/log/messages

In the other terminal window, run the following logger command.  $HOSTNAME is a shell or environment variable which is typically predefined for you in your terminal window.

logger -n $HOSTNAME -T -P 514 -p user.err This message should be received and logged.

You should see that message show up in the logfile.  If it does not, check your rsyslogd setup, and make sure you have bounced that daemon to pick up any changes.

Since you may be typing that command a lot while testing, here is a bash alias for it.

alias lg="logger -n $HOSTNAME -T -P 514 -p user.err This message should be received and logged."

The tsyslogd container log

You can view the end of the log from the tsyslogd container using the following command, executed from within your gw8 directory:

docker-compose logs --tail=5 -f tsyslogd

Leave that running in a termnal window while you execute tests in other terminal windows, to see what gets logged as the container operates.  Since you may be typing that command a lot while testing, here is a bash alias for it.

alias dcl="docker-compose logs --tail=5 -f tsyslogd"

Log-control settings in the telegraf-syslog.conf configuration file

There are four key settings in the telegraf-syslog.conf configuration file.  We show them here in a fairly logical sequence, which is not the order in which they appear in the file.  All of these items are set in the delivered config file to minimize the amount of logging from the tsyslogd container.  So to see anything interesting, you will need to alter these settings and bounce the container to pick up the chainges.

When interpreting these settings, it helps to have a mental picture of the sequence of plugins that the data will flow through inside Telegraf within the tsyslogd container.  Using the notation used for the respective configurations of thost plugins in the telegraf-syslog.conf file, we have in order:

[[inputs.syslog]]
[[processors.printer]]   (a tap for logging at this stage, if configured; otherwise ignored)
[[processors.classify]]
[[outputs.file]]         (a tap for logging at this stage, if configured; otherwise ignored)
[[outputs.groundwork]]

The output of [[processors.classify]] goes to both [[outputs.file]] and [[outputs.groundwork]] in parallel; there is no special sequencing between those two tail-end plugins.

The quiet option

[agent]
  quiet = true

The quiet option determines whether Telegraf should log only error-level messages.  This is not full control over all logging, but having it be set to true does bring the noise down as much as possible given other settings.  For debugging of the syslog input plugin or the classify processor plugin themselves, you will want to either comment this out or set it to false.  That said, this option does not control logging from either the processors.printer or outputs.file plugin, so if all you need is the summary lines at that level, you can leave the quiet option set to true.

The debug option

[agent]
  # debug = false

The debug option determines the level of log messages that will appear in the log from the plugins we care most about, primarily the classifier.  To see all the available detail from the classifier in the container log, you will want to uncomment this option and set it to true.

The processors.printer plugin

# [[processors.printer]]

The processors.printer plugin is positioned in the chain of plugins immediately after the syslog input plugin, by dint of its order directive having a smaller value than the order directive of the processors.classify plugin.  Setup for the processors.printer plugin is all commented out as delivered.  It is there in case you wish to log the output of the syslog input plugin, showing successful receipt of syslog messages.  Uncommenting the setup for this plugin allows you to see exactly when the syslog input plugin received a message and was able to parse it and send it downstream.  The logged output from this plugin will look similar to the following, all on one line but folded here manually for easier viewing.

syslog,appname=myusername,facility=user,hostname=myhost.com,severity=err
    timestamp=1658200282000000000i,message=" Testing testing 123",
    version=1i,facility_code=1i,severity_code=3i 1658200282111817490

That information is all logged in standard InfluxDB Line Protocol format (measurement name, tags, fields, timestamp).

The outputs.file plugin

# [[outputs.file]]

The outputs.file plugin is positioned in the chain of plugins after the processors.classify plugin, to allow you to easily see the classification results.  Setup for the outputs.file plugin is all commented out as delivered.  It is there in case you wish to log the output of the classify processor plugin, in compact form.  Uncommenting the setup for this plugin will log output similar to the following, all on one line but folded here manually for easier viewing.

syslog,appname=myusername,facility=user,hostname=myhost.com,severity=err,status=SERVICE_UNKNOWN
    message=" Testing testing 123",version=1i,facility_code=1i,severity_code=3i,
    timestamp=1658202259000000000i 1658202259284942617

That information is all logged in standard InfluxDB Line Protocol format (measurement name, tags, fields, timestamp).  Of particular note is the status tag, which contains the classification result that will be used as the status of the syslog service on the myhost.com host.

What to test

It can help to have three terminal windows open side-by-side when looking at the tsyslogd container log:

  • one terminal in which to type in logger commands
  • one terminal in which to tail /var/log/messages to observe that the messages were seen by the rsyslogd daemon
  • one terminal in which to tail the tsyslogd container log to see what is going on inside the container as messages arrive

Check the open ports on your rsyslog service

To test to see whether rsyslogd has the expected port open:

netstat -pantu | grep rsyslogd

You should see output similar to this:

% netstat -pantu | grep rsyslogd
tcp        0      0 0.0.0.0:514             0.0.0.0:*               LISTEN      27519/rsyslogd      
tcp6       0      0 :::514                  :::*                    LISTEN      27519/rsyslogd      

There are major security risks in having an rsyslog server open to the world on port 514.  On CentOS, you can use rich firewall rules to limit incoming connections on port 514 from whitelisted IP ranges.  Examples:

# firewall-cmd --permanent --add-port=514/tcp
# firewall-cmd --permanent --add-port=514/udp
# firewall-cmd --reload

# firewall-cmd --permanent --add-rich-rule 'rule family="ipv4" source address="123.123.123.0/21" port port="514" protocol="tcp" accept'
# firewall-cmd --permanent --add-rich-rule 'rule family="ipv4" source address="123.123.123.0/21" port port="514" protocol="udp" accept'
# firewall-cmd --reload

That is just a tiny sample of what you might do for security, to at least make you aware of such concerns.  For us here, that is a peripheral topic as regards troubleshooting syslog-message processing.

Test direct submission of messages from the GroundWork server

Use the following steps to test submission of messages via the local rsyslogd daemon.

  • Tail the /var/log/messages file, and run the logger command on the GroundWork server to send messages to the rsyslogd daemon.  Verify that they got received and logged.
  • Adjust the configuration details in the telegraf-syslog.conf configuration file to give yourself adequate visibility into what is going on inside the container.
  • Bounce the tsyslogd container to pick up the changes.
  • Run the logger command to send messages to the rsyslog service.  Verify that the messages got received and logged by the rsyslogd daemon.
  • Look at the container log to verify that the messages were received by the Telegraf syslog input plugin, and so would be passed downstream to the classify plugin.
  • From the container log data, check the classification and disposition of the messages.
  • If the messages are not dropped by the classifier, you should be able to see the message data in the Status screen in GroundWork Monitor, under the syslog service on the host that is the GroundWork server itself.

Test submission of messages from a remote rsyslogd daemon

Use the following steps to test submission of messages from remote devices and hosts.

  • Tail the /var/log/messages file on the GroundWork server so you can see evidence of incoming messages from remote hosts.
  • Run the logger command on the remote host, or otherwise trigger a device, to send log messages to the GroundWork server.  Verify that the messages got received and logged by the rsyslogd daemon on the GroundWork server, separate from whatever happens thereafter inside the tsyslogd container.
  • Follow the same steps as in the preceding procedure, giving yourself visibility into the inner workings of the tsyslogd container and verifying receipt and processing of messages there, as forwarded from the rsyslogd daemon on the GroundWork server.

What to do as a last resort

In extreme circumstances, it is possible to shut down the rsyslog service and run the rsyslogd daemon manually, in debug mode:

service rsyslog stop
/usr/sbin/rsyslogd -dn 2>&1 | tee /tmp/rsyslog-debug-log

See the following references if you get to the point where you think that might be necessary.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s1-debugging_rsyslog

https://www.rsyslog.com/doc/master/troubleshooting/debug.html

https://www.rsyslog.com/doc/master/troubleshooting/debug.html#security-risks

You can interrupt with Ctrl-C when done, then restart the daemon:

service rsyslog start

Cleaning up after testing

Once you debug and fix whatever problem you faced, you will want to remove whatever test scaffolding you put in place.  Such items may include:

  • Turning off all extra in-container logging, including disabling any extra plugins you added while testing.
  • Removing any extra logging you may have put into place from the rsyslog service.
  • Possibly, disabling the syslog service port you used for debugging, if that will not be used in production.
  • Reversing any firewall changes you made on a supposedly temporary basis.
  • Reversing any SELinux port changes you made on a supposedly temporary basis.
  • Restarting the rsyslog service
  • Restarting the container.

Related Resources

About Syslog Monitoring

Syslog Monitoring Installation