About Telegraf

Telegraf is an open source monitoring agent. It runs as a Go program or in a container. It is an efficient agent with a large base of plugins for many sources, but has a lot of configuration options, and we consider our support for it an advanced feature. GroundWork supports monitoring resources with Telegraf, and the 1.21.2 version of Telegraf includes the GroundWork Output plugin which makes it easier to use with GroundWork servers (CE or EE).

System Requirements 

You can run Telegraf according to the Telegraf documentation or in a container. Running Telegraf in a Docker container is easy, and GroundWork supports this, either on a GroundWork server or wherever you run your Docker containers. We prefer this method and cover it in this document. 

You will need:

  • A GroundWork server to receive the metrics gathered by Telegraf, available on a configurable TCP port
  • A (free) Docker Hub account (https://hub.docker.com)
  • Internet access from the GroundWork server (or you can also separately pull and load the Docker image)
  • Network access from the GroundWork server to the assets you want to poll with Telegraf
  • Root access to your GroundWork server, if you want to use Telegraf to process SNMP Traps or otherwise listen on ports below TCP/1024
  • Docker Engine which has been updated to at least 20.10. Current version of Docker can be found with the command docker version.

Deployment Options

How you deploy Telegraf is up to you. Telegraf itself is just a Go executable with configuration files. Where and how you run it is up to you, and there are too many possible options to list here. These instructions will get you going with a Telegraf container on a GroundWork server, but there's no reason you can't configure Telegraf running somewhere else to post data to a GroundWork server. All you need to do is configure it correctly.  

You can use the GroundWork Output plugin as we will describe, but it's also possible to set up Telegraf to collect and host metrics in Prometheus Exposition format, which can be polled at a specific URL by the GroundWork TCG-APM connector. 

Why would you do this? Well, if you are running a few Telegraf agents and you want to be able to centrally turn on and off the importing of metrics on each to GroundWork without touching the agents, this is a good way to go. You can manage the Telegraf instances the APM connector polls from the APM configuration screen in GroundWork Monitor. It's also good if you are used to the Prometheus format Telegraf uses and want to keep using it.

Conversely, you wouldn't want to do this if you are using Telegraf to poll assets you can connect to from the GroundWork server - just use the containerized Telegraf with the GroundWork Output plugin and input plugins that poll those assets. Similarly, if you are using Telegraf passively to listen for messages like Syslog or SNMP Traps, then it makes sense to use the GroundWork Output plugin to immediately forward these to GroundWork, rather than poll them with the APM connector. You may miss data if the APM polling interval is longer than the gap between messages. 

Setting Up Telegraf to Run in a Container

  1. Assuming you have all the requirements listed above, log in to your GroundWork server as the gwos user, and in the gw8 directory, edit the docker-compose.override.yml file.  Add the following: 

      # Telegraf acting in a capacity as a syslog daemon for snmptraps
      snmptrap:
        image: groundworkdevelopment/telegraf-docker:master
        volumes:
          - ${PWD}/snmptrap/etc/snmp/snmptt.conf.d:/etc/snmp/snmptt.conf.d
          - ${PWD}/snmptrap/etc/snmp/snmptt.ini:/etc/snmp/snmptt.ini
          - ${PWD}/snmptrap/etc/supervisor:/etc/supervisor
          - ${PWD}/snmptrap/etc/telegraf:/etc/telegraf
        depends_on:
          - groundwork
        entrypoint: ["bash", "-c"]
        command: ["while test $$(wget -qO - groundwork:8080/index.html | grep 'Groundwork Server' | wc -l) != 2;
                    do echo 'awaiting groundwork health'; sleep 10; done;
                    PATH=/opt/telegraf:$$PATH
                    exec supervisord -c /etc/supervisor/supervisord.conf"]
        ports:
          - "162:162/udp"
          # - "16514:6514/tcp"  # syslog
          # - "16514:6514/udp"  # syslog
    
        # gw8ctl up -d --force-recreate snmptrap ; echo -- ; gw8ctl logs -f snmptrap
        # gw8ctl exec snmptrap  snmptrap -v 2c -c public localhost ""  .1.3.6.1.6.3.1.1.5.4
        #
    
    
    CODE

    Don't start it yet - it won't work until it is configured. 

    Occasionally, our work with telegraf will be at a point that has been fully released by Telegraf. In that special case, docker-compose.override.yml can be configured to target a potentially newer image at:
    image: telegraf:1.21.4


    Now you are ready to configure Telegraf.

Configuring Telegraf 

In order to start Telegraf, you will need a configuration file. The modification you just made to docker-compose.override.yml points the telegraf container to look for its configuration files in the gw8/snmptrap/etc directory with a docker bind mount. Specifically, it looks for the file gw8/snmptrap/etc/telegraf.conf.

The place to start is telegraf.conf. The first step should be extracting our snmp example config directory to the gw8 directory: examples20221121.tar.gz
Inside the new examples directory, there will be a snmptrap directory. Copy or move this directory to gw8/snmptrap.
This new directory will contain everything targeted by the bind mounts in the above docker-compose.override.yml file.

Now use your editor of choice to open gw8/snmptrap/etc/telegraf/telegraf.conf.

Inside this file, you will see the output plugin configuration: 

###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# Send metrics to nowhere at all
# [[outputs.discard]]
#   # no configuration

# [[outputs.file]]
#   ## Files to write to, "stdout" is a specially handled file.
#   files = ["stdout"]

# Send telegraf metrics to GroundWork Monitor
[[outputs.groundwork]]
  ## URL of your groundwork instance.
  url = "http://groundwork:8080"

  ## Agent uuid for GroundWork API Server.
  agent_id = ""

  ## Username and password to access GroundWork API.
  username = ""
  password = ""

  ## Default application type to use in GroundWork client
  ## SYSLOG|SNMPTRAP|TELEGRAF
  default_app_type = "SNMPTRAP"

  ## Default display name for the host with services(metrics).
  # default_host = "syslog"

  ## Default service state.
  # default_service_state = "SERVICE_OK"

  ## The name of the tag that contains the hostname.
  # resource_tag = "host"
  resource_tag = "hostname"

  ## The name of the tag that contains the host group name.
  # group_tag = "group"

The first section defines a file output so we can see the metrics in the log file. Again, you will not need this in production, but it is very useful at first. The second definition is for the GroundWork Output plugin. Here's what you need to know:

  1.  Leave the URL at the default "http://groundwork:8080

    If you are deploying Telegraf on one GroundWork Server and posting the data collected to another, set the URL to the https address of the GroundWork server you want to post the data to  (the same https URL as you would use for logging in). 

  2. For the agentID, you will need to generate a uuid by typing the following at the command line:

    uuidgen
    CODE

    You will get back a string similar to 115756e3-c852-4657-99f3-c010381612b6. Set the agent_id to your string. It just has to be a random uuid, nothing special.

  3. For the username and password, you can use any valid username and password on your GroundWork server. We recommend making a user specifically for the Telegraf plugin so it is easier to control. It need not have admin privileges, a user level role is fine. 
  4. For the resource_tag, you don't have to specify a string, but for compatibility with configurations that use APM, it's a good idea to use resource instead of the default host. That way you can more easily move back and forth between modalities on a single Telegraf instance.

    For more details about the GroundWork Output plugin, see https://github.com/influxdata/telegraf/tree/release-1.21/plugins/outputs/groundwork
    Here's a link to Telegraf Plugin documentation

Starting Telegraf

When you are ready, start the gw8 deployment with included telegraf snmp trap container configured above: 

docker-compose up -d
CODE

This will pull the container, and start it with the configuration you created. 

Making Sense of the Configuration

At this point, if all is well the Telegraf container is started, and is monitoring the website or sites you placed in the input plugin URL list. You can verify it is started with docker-compose: 

docker-compose ps | grep snmptrap
dockergw8_snmptrap_1         bash -c while test $(wget  ...   Up
CODE

To see the debug logs, just look at the container logs: 

docker-compose logs -f snmptrap
CODE

This will give you some output similar to the following: 

Attaching to dockergw8_snmptrap_1
snmptrap_1        | awaiting groundwork health
snmptrap_1        | 2022-09-08 10:47:18,346 INFO Included extra file "/etc/supervisor/conf.d/snmptrapd.conf" during parsing

A little further down in the log we see the debug message from the GroundWork plugin:

snmptrap_1        | syslog,Severity=WARNING,appname=snmptt,facility=local0,hostname=127.0.0.1,source=127.0.0.1 procid="16",message=".1.3.6.1.6.3.1.1.5.4 Normal \"Status Events\" 127.0.0.1 - Link up on interface $1.  Admin state: $2.  Operational state: $3",version=1i,facility_code=16i,severity_code=4i,timestamp=1662634069000000000i 1662634069606748077

Any time you make a change to the Telegraf configuration, you will need to restart the Telegraf plugin. Do this now:

From the gw8 directory, type: 

docker-compose kill telegraf
docker-compose up -d
CODE

In summary, this advanced method of monitoring allows us to map literally hundreds of plugins and thousands of metrics into hosts and services in GroundWork, as well as assign status, custom service descriptions, status text and thresholds. Mastery of the Telegraf configuration files is not trivial but it is very rewarding in terms of your ability to match the data gathering capabilities of Telegraf with GroundWork automated graphing, dashboards, notification and reporting capabilities.  


If you do not see 162/udp open and listening, it's likely that you will need to adjust the firewall settings. Unless you are otherwise blocking 162/udp, this may work:

docker-compose down
sudo service docker restart
sudo service docker start
docker-compose up -d
docker-compose restart telegraf
CODE

Be advised that this process requires sudo (root) access, and that monitoring will be paused during the restart. 

Configuring Status Mapping

At this point in configuration, you should see severity codes of snmp traps showing in the event logs of individual services. However, this code is not guaranteed to map to a preferred service status. Service status is what affects display in NOC boards and status dashboards, as well as triggering notifications. This is fully configurable.

A default framework is provided in the latest version of the examples archive linked above. It can also be manually changed and added to the file gw8/snmptrap/etc/telegraf/telegraf.conf as follows, within the PROCESSOR PLUGINS block.

telegraf.conf

# Create a status tag based on severity tag of trap.
# This processor depends on previous case processing of tag and key.
[[processors.enum]]
  order = 40

  [[processors.enum.mapping]]
    tag = "Severity"
    dest = "status"

    [processors.enum.mapping.value_mappings]
      EMERG = "SERVICE_UNSCHEDULED_CRITICAL"
      ALERT = "SERVICE_UNSCHEDULED_CRITICAL"
      CRIT = "SERVICE_UNSCHEDULED_CRITICAL"
      ERR = "SERVICE_UNSCHEDULED_CRITICAL"
      WARNING = "SERVICE_WARNING"
      NOTICE = "SERVICE_OK"
      INFO = "SERVICE_WARNING"
      DEBUG= "SERVICE_WARNING"

CODE

The source severity codes are all standard syslog severity codes. They can be mapped to the list of GroundWork output plugin service status codes found here: https://github.com/gwos/telegraf/blob/feat/classify-processor/plugins/outputs/groundwork/README.md

To pick up changes in this configuration, the snmptrap container must be restarted.

Adding MIBS

There is a default deployment of MIBS in the correct format located in the container at /var/lib/mibs. These have already been converted to the snmptt format.

If custom MIBS must be added, they can be placed in a new bind mount volume in docker-compose.override.yml. Alternatively, place them in the snmptrap/etc/snnp/stmptt.conf.d directory. However, these may need to be converted to the snmptt format.

Assuming these are mounted to /etc/snmp/snmptt.conf.d/, the following command will convert these files to the correct format. Be sure to back up the mibs directory before performing this command.

for i in snmptrap/etc/snnp/stmptt.conf.d/*-MIB ; do snmpttconvertmib --in=${i} --out=${i}.snmptt.conf ; done
CODE

These paths of course must be configured in the snmptt configuration files located in gw8/snmptrap/etc

Configuration Notes

Delete excess inventory

In GroundWork Monitor 8.2.1 and above, it is trivial to remove hosts and services that are no longer getting updated. This integration has the capacity to add a lot of resources, so don't be shy about removing them. In the Status Summary, you can do this with the Delete Selected action on the Search/Actions submenu:
 

Turn off debug when done

In general you won't want to have the debug level on in production, and you may even want to turn off the "file" output plugin, especially if you frequently poll a lot of metrics. 

Beware of overload

As with any monitoring method, just because you can monitor something, it doesn't mean you should. If you see a plugin coming back with a lot more metrics than you need, use a processor plugin to drop the extras. 

Appendices

Appendix A: Legacy SNMP Trap Translator Script

This script provides capability to translate configurations converted from the previously used snmpttconvertmib application included with GroundWork Monitor version 7, to the Telegraf configuration format for the GroundWork Monitor version 8 solution of using Telegraf to process traps.

This script will convert the majority of standard configurations as-generated by snmpttconvertmib. In the event that a configuration cannot be processed, the portions which cannot be processed will be displayed in the output.

  • Specifically, any MODE or EXEC lines will be skipped and require manual intervention to restore, but the configuration will otherwise be generated.
  • Mappings from name to OID are provided in the generated configurations when using this script, as such, it is not necessary to load the MIBs into the Telegraf container for configurations generated by this, as we completely bypass any translation.
  • Resulting Telegraf configurations can be added to the ~/gw8/config/config.d/ directory.
  • It is recommended that at the very least, the configuration used in the attached telegraf.conf be used. In such a case, all you need is the telegraf.conf in ~/gw8/conf/
  • If you run this on a GroundWork 7 system, change the first line of the script to point to the GW7 perl binary, which is noted in a comment in the script itself.

Remaining usage information is provided in the script itself by running it with the -h flag, current usage is as follows:

usage:  gw_convert_snmptt_v2.pl [ -f /path/to/snmpttconverted.mib.conf ]
            [ -c {snmptraps_last|trapname}] [ -o {number} ] [ -m {oid} ] [ -d ] [ -r ]
        gw_convert_snmptt_v2.pl -h
where:  -f  The full path and file name of a mib.conf generated by snmpttconvertmib
        -c  The desired naming convention of the service.
                Specify snmptraps_last for a single snmptraps_last service per host for all traps.
                Specify trapname to have individual services created named after the trap itself when a trap is received.
        -o  A number divisible by 100. This number should be interative and line up with other telegraf configurations in use.
                This number determines the execution order of configuration directives in Telegraf.
                So, if your last (or existing) configuration ended at order 399, select 400 for this option.
                If generating more configurations, pay attention to where those end and iterate this option appropriately.
        -m  Configuration value mapping. Only oid is supported currently for direct-to-OID mapping.
        -d  Over-write mode. This mode will write out configurations, replacing old ones if they exist.
                Not particularly dangerous to use, will only over-write files created by this script's naming convention.
        -r  A hidden feature (not that well hidden, I suppose)
                This will write a file which contains snmptrap and snmptranslate commands for each trap.
                It is quite useful to see that hosts and services are created when testing configurations.
                This option does not include any variables, is not supported, and is meant only for dev/test situations.This script will write any generated files to the current working directory, as follows:
- If using -c snmptraps_last the Telegraf configuration will be named inputfilename-snmptraps_last.conf
- If using -c trapname the Telegraf configuration will be named inputfilename-trapname.conf
- The -r option will generate a file inputfilename-TRAPS.txt with the snmptrap/snmptranslate commands.The generated configurations can then be copied to the ~/gw8/config/config.d/ directory.
Refer to the GW8 documentation for more detail on adding configurations.example:
gw_convert_snmptt_v2.pl -f powernet419traps.conf -c trapname -o 300 -m oid -d -r
CODE

Related Resources