About Telegraf

Telegraf is an open source monitoring agent. It runs as a Go program or in a container. It is an efficient agent with a large base of plugins for many sources, but has a lot of configuration options, and we consider our support for it an advanced feature. GroundWork supports monitoring resources with Telegraf, and the 1.21.1 version of Telegraf includes the GroundWork Output plugin which makes it easier to use with GroundWork servers (CE or EE).

System Requirements 

You can run Telegraf according to the Telegraf documentation or in a container. Running Telegraf in a Docker container is easy, and GroundWork supports this, either on a GroundWork server or wherever you run your Docker containers. We prefer this method and cover it in this document. We have even created our own downstream container with a little bit of scripting to make it easier to use in a GroundWork server context. 

You will need:

  • A GroundWork server to receive the metrics gathered by Telegraf, available on a configurable TCP port
  • A telegraf-docker container supplied by GroundWork, with the GroundWork Output plugin included (Telegraf 1.21.1 or later) 
  • Internet access from the GroundWork server (or you can also separately pull and load the Docker image)
  • Network access from the GroundWork server to the assets you want to poll with Telegraf
  • Root access to your GroundWork server, if you want to use Telegraf to process SNMP Traps or otherwise listen on ports below TCP/1024

Deployment Options

How you deploy Telegraf is up to you. Telegraf itself is just a Go executable with configuration files. Where and how you run it is up to you, and there are too many possible options to list. These instructions will get you going with a Telegraf container on a GroundWork server, but there's no reason you can't configure Telegraf running somewhere else to post data to a GroundWork server. All you need to do is configure it correctly.  

You can use the GroundWork Output plugin as we will describe, but it's also possible to set up Telegraf to collect and host metrics in Prometheus Exposition format, which can be polled at a specific URL by the GroundWork TCG-APM connector. 

Why would you do this? Well, if you are running a few Telegraf agents and you want to be able to centrally turn on and off the importing of metrics on each to GroundWork without touching the agents, this is a good way to go. You can manage the Telegraf instances the APM connector polls from the APM configuration screen in GroundWork Monitor. It's also good if you are used to the Prometheus format Telegraf uses and want to keep using it.

Conversely, you wouldn't want to do this if you are usingTelegraf to poll assets you can connect to from the GroundWork server - just use the containerized Telegraf with the GroundWork Output plugin and input plugins that poll those assets. Similarly, if you are using Telegraf passively to listen for messages like Syslog or SNMP Traps, then it makes sense to use the GroundWork Output plugin to immediately forward these to GroundWork, rather than poll them with the APM connector. You may miss data if the APM polling interval is longer than the gap between messages. 

Adding the Telegraf Application Type

In order for Telegraf to be permitted to create hosts/services in GroundWork, it must be added as an application.

  1. To add the Telegraf Application Type, as the gwos user (su - gwos) and in the gw8 directory enter:

    docker-compose exec pg psql gwcollagedb
    CODE
  2. Run the following query:

    INSERT INTO ApplicationType(Name, DisplayName, Description, StateTransitionCriteria) VALUES ('TELEGRAF', 'TELEGRAF', 'Data from the Telegraf Plugin', 'Device;Host;ServiceDescription');
    CODE
  3. Restart GroundWork:

    docker-compose down
    docker-compose up -d
    CODE

Setting Up Telegraf to Run in a Container

  1. Assuming you have all the requirements listed above, log in to your GroundWork server as the gwos user, and in the gw8 directory, edit the docker-compose.override.yml file.  Add the following: 

      telegraf:
        image: groundworkdevelopment/telegraf-docker:latest
        command: /docker_cmd.sh
        volumes:
           - ${PWD}/config/:/config
           - /var/run/docker.sock:/var/run/docker.sock
    CODE

    Don't start it yet - it won't work until it is configured. 

  2. Create the configuration directory: 

    mkdir config
    CODE

    Now you are ready to configure Telegraf.

Configuring Telegraf 

In order to startTelegraf, you will need a configuration file. The docker-compose.override.yml modification you just made points the telegraf-docker container to look for its configuration files in the gw8/config directory. Specifically, it looks for the file gw8/config/telegraf.conf, and additional *.conf files in the gw8/config/config.d directory. 

The place to start is telegraf.conf. Here is an example file: telegraf.conf.example

Inside this file you will find some global agent configuration including a debug level setting for troubleshooting. You will want to turn this off in production, but leave it on for now:

## Log at debug level.
   debug = true
  ## Log only error level messages.
   quiet = false

After this you will see the output plugin configuration: 

###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# Standard out - so we see metrics in the logs. Optional in production.
# # Send telegraf metrics to file(s)
 [[outputs.file]]
#   ## Files to write to, "stdout" is a specially handled file.
   files = ["stdout"]
#
   data_format = "prometheus"
   prometheus_export_timestamp = true
   prometheus_sort_metrics = false
   prometheus_string_as_label = true


########################################################
#   GroundWork Output Plugin                           #
########################################################
# Configuration for gw8 server to send metrics to
[[outputs.groundwork]]
  ## HTTP endpoint for your groundwork instance running in same container.
   url = "http://foundation:8080"
  ## Agent uuid for Groundwork API Server
   agent_id = "XXXXXXXXXXXXXXXXXXXXXXXX"
  ## Username to access Groundwork API
   username = "admin"
  ## Password to user with username
   password = "*****"
   resource_tag = "resource"

The first section defines a file output so we can see the metrics in the log file. Again, you will not need this in production, but it is very useful at first. The second definition is for the GroundWork Output plugin. Here's what you need to know:

  1.  Set the URL to the https address of your GroundWork server (the same as you would use for logging in). 
  2. For the agentID, you will need to generate a uuid by typing the following at the command line:

    uuidgen
    CODE

    You will get back a string similar to 115756e3-c852-4657-99f3-c010381612b6. Set the agent_id to your string. It just has to be a random uuid, nothing special.

  3. For the username and password, you can use any valid username and password on your GroundWork server. We recommend making a user specifically for the Telegraf plugin so it is easier to control.
  4. For the resource_tag, you don't have to specify a string, but for compatibility with configurations that use APM, it's a good idea to use resource. That way you can more easily move back and forth between modalities on a single Telegraf instance.

    For more details about the GroundWork Output plugin, see https://github.com/influxdata/telegraf/tree/release-1.21/plugins/outputs/groundwork

The next section is a typical input configuration example. For our example, we will use an http plugin: 

##########################################################################
#########         INPUT PLUGINS ##########################################
##########################################################################


[[inputs.http_response]]

  ## address is Deprecated in 1.12, use 'urls'

  ## List of urls to query.
   urls = [
       "https://gwos.com",
          ]

What this does is tell Telegraf to check the URL(s) listed. The frequency is defined in the global settings as every 10 seconds. 

Starting Telegraf

When you are ready, place this file in the gw8/config directory as telegraf.conf. You are now ready to start Telegraf: 

docker-compose up -d
CODE

This will pull the container, and start it with the configuration you created. 

Making Sense of the Configuration

At this point, if all is well the Telegraf container is started, and is monitoring the website or sites you placed in the input plugin URL list. You can verify it is started with docker-compose: 

docker-compose ps | grep telegraf
dockergw8_telegraf_1   /docker_cmd.sh         Up
CODE

To see the debug logs, just look at the container logs: 

docker-compose logs -f telegraf
CODE

This will give you some output similar to the following: 

telegraf_1       | # HELP http_response_content_length Telegraf collected metric
telegraf_1       | # TYPE http_response_content_length untyped
telegraf_1       | http_response_content_length{device="",method="GET",result="success",result_type="success",server="https://gwos.com",status_code="301"} 162 1640726020000

This is the result of the "file" output plugin writing the results in Prometheus Exposition format to the log. The metric name in this case is "http_response_content_length", and the value is 162. The tags (inside the curly brackets) are the data the http plugin reports, and this is what we have to work with in reporting to GroundWork. 

A little further down in the log we see the debug message from the GroundWork plugin:

telegraf_1       | 2021-12-28T21:13:40Z I! [outputs.groundwork] Send request headers:map["Accept":"application/json" "Content-Type":"application/json" "GWOS-API-TOKEN":"61dde1cb-c9a7-4324-9244-47c036da9f90" "GWOS-APP-NAME":"telegraf"] status:200 method:"POST" url:"https://deimos.gwos/api/monitoring?dynamic=true" payload:"{\"context\":{\"appType\":\"TELEGRAF\",\"agentId\":\"115756e3-c852-4657-99f3-c0103815b2b6\",\"traceToken\":\"b2f80718-ed45-559e-8bfe-84f9c85eb6e9\",\"timeStamp\":\"1640726020293\",\"version\":\"1.0.0\"},\"resources\":[{\"name\":\"telegraf\",\"type\":\"host\",\"status\":\"HOST_UP\",\"lastCheckTime\":\"1640726020293\",\"services\":[{\"name\":\"http_response\",\"type\":\"service\",\"owner\":\"telegraf\",\"status\":\"SERVICE_OK\",\"lastCheckTime\":\"1640726020000\",\"metrics\":[{\"metricName\":\"response_time\",\"sampleType\":\"Value\",\"interval\":{\"endTime\":\"1640726020000\"},\"value\":{\"valueType\":\"DoubleType\",\"doubleValue\":0.065927795},\"unit\":\"1\"},{\"metricName\":\"http_response_code\",\"sampleType\":\"Value\",\"interval\":{\"endTime\":\"1640726020000\"},\"value\":{\"valueType\":\"IntegerType\",\"integerValue\":301},\"unit\":\"1\"},{\"metricName\":\"content_length\",\"sampleType\":\"Value\",\"interval\":{\"endTime\":\"1640726020000\"},\"value\":{\"valueType\":\"IntegerType\",\"integerValue\":162},\"unit\":\"1\"}

Parsing this, you can see that the GroundWork Output plugin applies defaults, namely "telegraf" for the hostname ( \"name\":\"telegraf\",\"type\":\"host\") and "http_response" for the service name  (\"name\":\"http_response\",\"type\":\"service\"). The status is "SERVICE_OK", which is correct, but actually this is also a default, since there's no tag with key "status", which is what GroundWork uses to assign state. 

Obviously, this could be better. We should be able to create hosts based on what we monitor in the url list. We should be able to assign state, as well, based on the result of the http status codes, or a failure of DNS to find the host at all. We could set thresholds to alarm on the metrics values in the Status Summary using the Edit - Thresholds feature, but this is a manual process and we should really automate this. 

This level requires a "processor" plugin, or more than one. Here's an example: http_tests.conf.example

This file is commented as to what it does, so we won't go through it here. To use, it, simply place it in the gw8/config/config.d directory and change the filename to end in .conf.

Any time you make a change to the Telegraf configuration, you will need to restart the Telegraf plugin. Do this now:

  1. From the gw8 directory, type: 

    docker-compose kill telegraf
    docker-compose up -d
    CODE
  2. Reviewing the logs now, we see that more information is getting into our tags: 

    telegraf_1       | http_response_content_length{device="",message="The http response was a success and got a result of 301.",method="GET",resource="gwos.com",result="success",result_type="success",server="https://gwos.com",service="web_site_availability",status="SERVICE_WARNING",status_code="301"} 162 1640727630000
    • message="The http response was a success and got a result of 301."
    • resource="gwos.com"
    • service="web_site_availability"
    • status="SERVICE_WARNING"

      These new tags are created by the processor plugins to make the hostname (resource) out of part of the URL with a regex, the message (status text) and service (service description) as templated tags, and the status as the result of an enum on the result code. These tags are what GroundWork expects, and so they map perfectly into a host and service with multiple metrics. We can see the results in Status Summary:

In summary, this advanced method of monitoring allows us to map literally hundreds of plugins and thousands of metrics into hosts and services in GroundWork, as well as assign status, custom service descriptions, status text and thresholds. Mastery of the Telegraf configuration files is not trivial but it is very rewarding in terms of your ability to match the data gathering capabilities of Telegraf with GroundWork automated graphing, dashboards, notification and reporting capabilities.  

Setting up Telegraf to Process SNMP Traps

The ability of telegraf to process SNMP Traps has been a focus of development at GroundWork for some time. With contributions from many authors, we now have a robust way of listening for and processing, with a high degree of flexibility, virtually any SNMP Trap. Of course, the type of traps you send, where you send them from (the devices or a manager), the MIBs you use, etc. will all vary, and so you should be prepared to add to the examples here as you go. The following instructions tell you the basics, and give you a good start on using this capability.

  1. Starting with the example above configured and working, in the gw8 directory, edit the docker-compose.override.yml file. Add the mibs volume and UDP port

    image: groundworkdevelopment/telegraf-docker:latest
        command: /docker_cmd.sh
        volumes:
           - ${PWD}/config/:/config
           - mibs:/usr/share/snmp/mibs
           - /var/run/docker.sock:/var/run/docker.sock
        ports:
          - "162:162/udp" # snmp trap port
    CODE

    also, in the volumes: section, add the mibs: volume: 

    volumes:
    # Uncomment to enable tracing
    #  jaegertracing:
      mibs:
    CODE
  2. Restart Telegraf:

    docker-compose ps | grep telegraf
    CODE

    Verify that the port is open: 

    dockergw8_telegraf_1   /docker_cmd.sh         Up             0.0.0.0:162->162/udp

    If you do not see 162/udp open and listening, it's likely that you will need to adjust the firewall settings. Unless you are otherwise blocking 162/udp, this may work:

    docker-compose down
    sudo service docker restart
    sudo service docker start
    docker-compose up -d
    docker-compose restart telegraf
    CODE

    Be advised that this process requires sudo (root) access, and that monitoring will be paused during the restart. 

  3. Add mibs to the mibs: volume. The mibs you add are up to you, but at a minimum you will need the if-mib (for interface status) to use this example. There are many trap mibs available for download from the device providers, and GroundWork does not distribute these, but all are handled in the same way. Place the desired mib on the GroundWork server, and type, for example:

    docker cp IF-MIB.txt dockergw8_telegraf_1:/usr/share/snmp/mibs/IF-MIB.txt

    If you have a number of mibs to transfer, you may want to zip them up first and transfer them as a single tar file, for example:

    docker cp mibs.tar.gz dockergw8_telegraf_1:/usr/share/snmp/mibs/mibs.tar.gz
    docker-compose exec telegraf  bash
     cd /usr/share/snmp/mibs/
     tar zxvf mibs.tar.gz
     rm mibs.tar.gz
     exit
  4. Add the snmp input plugin definition to the gw8/config/telegraf.conf file: 

    # # Receive SNMP traps
    # [[inputs.snmp_trap]]
    
    # You need to tell the input where the mibs are
      path = ["/usr/share/snmp/mibs"]
    
    # Some tags we don't need:
      tagexclude = ["community", "oid", "version", "host"]
    
    # Filter by source IP if you want to
      [inputs.snmp_trap.tagpass]
        source = ["192.168*"]
    CODE

    Note the path directive to point to the mib location. This was added as of a recent change to the snmp trap input plugin, and is needed. The tag exclude and tagpass filter are optional. 

  5. Add the required processor plugin configuration files to gw8/config/config.d as ".conf" files. Here are some useful examples (you can right click to copy address):

Configuration Notes

Delete the cruft

In GroundWork Monitor 8.2.1 and above, it is trivial to remove hosts and services that are no longer getting updated. This integration has the capacity to add a lot of resources, so don't be shy about removing them. In the Status Summary, you can do this with the Delete Selected action on the Search/Actions submenu:
 

Turn off debug when done

In general you won't want to have the debug level on in production, and you may even want to turn off the "file" output plugin, especially if you frequently poll a lot of metrics. 

Beware of overload

As with any monitoring method, just because you can monitor something, it doesn't mean you should. If you see a plugin coming back with a lot more metrics than you need, use a processor plugin to drop the extras. 

Filter out the infiltrators

You can filter incoming traps or logs, and add more conditions based on any of the input tags, like mib, name, etc. You can also use lists of values, e.g., 

# Filter by source IP
  [inputs.snmp_trap.tagpass]
    source = ["192.168*", "172.21.12*"]
    mib = ["IF-MIB"]

Related Resources