Telegraf Monitoring (Advanced)
About Telegraf
Telegraf is an open source monitoring agent. It runs as a Go program or in a container. It is an efficient agent with a large base of plugins for many sources, but has a lot of configuration options, and we consider our support for it an advanced feature. GroundWork supports monitoring resources with Telegraf, and the 1.21.2 version of Telegraf includes the GroundWork Output plugin which makes it easier to use with GroundWork servers (CE or EE).
System Requirements
You can run Telegraf according to the Telegraf documentation or in a container. Running Telegraf in a Docker container is easy, and GroundWork supports this, either on a GroundWork server or wherever you run your Docker containers. We prefer this method and cover it in this document.
You will need:
- A GroundWork server to receive the metrics gathered by Telegraf, available on a configurable TCP port
- A (free) Docker Hub account (https://hub.docker.com)
- Internet access from the GroundWork server (or you can also separately pull and load the Docker image)
- Network access from the GroundWork server to the assets you want to poll with Telegraf
- Root access to your GroundWork server, if you want to use Telegraf to process SNMP Traps or otherwise listen on ports below TCP/1024
Deployment Options
How you deploy Telegraf is up to you. Telegraf itself is just a Go executable with configuration files. Where and how you run it is up to you, and there are too many possible options to list here. These instructions will get you going with a Telegraf container on a GroundWork server, but there's no reason you can't configure Telegraf running somewhere else to post data to a GroundWork server. All you need to do is configure it correctly.
You can use the GroundWork Output plugin as we will describe, but it's also possible to set up Telegraf to collect and host metrics in Prometheus Exposition format, which can be polled at a specific URL by the GroundWork TCG-APM connector.
Why would you do this? Well, if you are running a few Telegraf agents and you want to be able to centrally turn on and off the importing of metrics on each to GroundWork without touching the agents, this is a good way to go. You can manage the Telegraf instances the APM connector polls from the APM configuration screen in GroundWork Monitor. It's also good if you are used to the Prometheus format Telegraf uses and want to keep using it.
Conversely, you wouldn't want to do this if you are using Telegraf to poll assets you can connect to from the GroundWork server - just use the containerized Telegraf with the GroundWork Output plugin and input plugins that poll those assets. Similarly, if you are using Telegraf passively to listen for messages like Syslog or SNMP Traps, then it makes sense to use the GroundWork Output plugin to immediately forward these to GroundWork, rather than poll them with the APM connector. You may miss data if the APM polling interval is longer than the gap between messages.
Adding the Telegraf Application Type
In order for Telegraf to be permitted to create hosts/services in GroundWork, it must be added as an application.
To add the Telegraf Application Type, as the gwos user (su - gwos) and in the gw8 directory enter:
docker-compose exec pg psql gwcollagedb
CODERun the following query:
INSERT INTO ApplicationType(Name, DisplayName, Description, StateTransitionCriteria) VALUES ('TELEGRAF', 'TELEGRAF', 'Data from the Telegraf Plugin', 'Device;Host;ServiceDescription');
CODERestart GroundWork:
docker-compose down docker-compose up -d
CODE
Setting Up Telegraf to Run in a Container
Assuming you have all the requirements listed above, log in to your GroundWork server as the gwos user, and in the gw8 directory, edit the docker-compose.override.yml file. Add the following:
telegraf: image: telegraf:1.21.2 volumes: - ${PWD}/config/:/etc/telegraf/
CODEDon't start it yet - it won't work until it is configured.
Create the configuration directory:
mkdir config
CODENow you are ready to configure Telegraf.
Configuring Telegraf
In order to start Telegraf, you will need a configuration file. The modification you just made to docker-compose.override.yml points the telegraf container to look for its configuration files in the gw8/config directory with a docker bind mount. Specifically, it looks for the file gw8/config/telegraf.conf.
The place to start is telegraf.conf. Here is an example file, telegraf.conf.example, which you can use as a starting point. Place it in the gw8/config directory, and remove the .example from the file name to use it.
Inside this file you will find some global agent configuration including a debug level setting for troubleshooting. You will want to turn this off in production, but leave it on for now:
## Log at debug level. debug = true ## Log only error level messages. quiet = false
After this you will see the output plugin configuration:
############################################################################### # OUTPUT PLUGINS # ############################################################################### # Standard out - so we see metrics in the logs. Optional in production. # # Send telegraf metrics to file(s) [[outputs.file]] # ## Files to write to, "stdout" is a specially handled file. files = ["stdout"] # data_format = "prometheus" prometheus_export_timestamp = true prometheus_sort_metrics = false prometheus_string_as_label = true ######################################################## # GroundWork Output Plugin # ######################################################## # Configuration for gw8 server to send metrics to [[outputs.groundwork]] ## HTTP endpoint for your groundwork instance running in same container. url = "http://groundwork:8080" ## Agent uuid for Groundwork API Server agent_id = "XXXXXXXXXXXXXXXXXXXXXXXX" ## Username to access Groundwork API username = "user" ## Password to user with username password = "*****" resource_tag = "resource"
The first section defines a file output so we can see the metrics in the log file. Again, you will not need this in production, but it is very useful at first. The second definition is for the GroundWork Output plugin. Here's what you need to know:
Leave the URL at the default "http://groundwork:8080"
If you are deploying Telegraf on one GroundWork Server and posting the data collected to another, set the URL to the https address of the GroundWork server you want to post the data to (the same https URL as you would use for logging in).
For the agentID, you will need to generate a uuid by typing the following at the command line:
uuidgen
CODEYou will get back a string similar to 115756e3-c852-4657-99f3-c010381612b6. Set the agent_id to your string. It just has to be a random uuid, nothing special.
- For the username and password, you can use any valid username and password on your GroundWork server. We recommend making a user specifically for the Telegraf plugin so it is easier to control. It need not have admin privileges, a user level role is fine.
- For the resource_tag, you don't have to specify a string, but for compatibility with configurations that use APM, it's a good idea to use resource instead of the default host. That way you can more easily move back and forth between modalities on a single Telegraf instance.
For more details about the GroundWork Output plugin, see https://github.com/influxdata/telegraf/tree/release-1.21/plugins/outputs/groundwork
The next section is a typical input plugin configuration. For our example, we will use an http plugin:
########################################################################## ######### INPUT PLUGINS ################################### ########################################################################## [[inputs.http_response]] ## address is Deprecated in 1.12, use 'urls' ## List of urls to query. urls = [ "https://gwos.com", ]
What this does is tell Telegraf to check the URL(s) listed with the default set of metrics for the http_response plugin. The frequency is defined in the global settings as every 10 seconds.
Starting Telegraf
When you are ready, start Telegraf:
docker-compose up -d
This will pull the container, and start it with the configuration you created.
Making Sense of the Configuration
At this point, if all is well the Telegraf container is started, and is monitoring the website or sites you placed in the input plugin URL list. You can verify it is started with docker-compose:
docker-compose ps | grep telegraf
dockergw8_telegraf_1 /docker_cmd.sh Up
To see the debug logs, just look at the container logs:
docker-compose logs -f telegraf
This will give you some output similar to the following:
telegraf_1 | # HELP http_response_content_length Telegraf collected metric telegraf_1 | # TYPE http_response_content_length untyped telegraf_1 | http_response_content_length{device="",method="GET",result="success",result_type="success",server="https://gwos.com",status_code="301"} 162 1640726020000
This is the result of the "file" output plugin writing the http_response results in Prometheus Exposition format to the log. The metric name in this case is "http_response_content_length", and the value is 162. The tags (inside the curly brackets) are the data the http plugin reports. These tags facilitate reporting the results to GroundWork.
A little further down in the log we see the debug message from the GroundWork plugin:
telegraf_1 | 2021-12-28T21:13:40Z I! [outputs.groundwork] Send request headers:map["Accept":"application/json" "Content-Type":"application/json" "GWOS-API-TOKEN":"61dde1cb-c9a7-4324-9244-47c036da9f90" "GWOS-APP-NAME":"telegraf"] status:200 method:"POST" url:"https://deimos.gwos/api/monitoring?dynamic=true" payload:"{\"context\":{\"appType\":\"TELEGRAF\",\"agentId\":\"115756e3-c852-4657-99f3-c0103815b2b6\",\"traceToken\":\"b2f80718-ed45-559e-8bfe-84f9c85eb6e9\",\"timeStamp\":\"1640726020293\",\"version\":\"1.0.0\"},\"resources\":[{\"name\":\"telegraf\",\"type\":\"host\",\"status\":\"HOST_UP\",\"lastCheckTime\":\"1640726020293\",\"services\":[{\"name\":\"http_response\",\"type\":\"service\",\"owner\":\"telegraf\",\"status\":\"SERVICE_OK\",\"lastCheckTime\":\"1640726020000\",\"metrics\":[{\"metricName\":\"response_time\",\"sampleType\":\"Value\",\"interval\":{\"endTime\":\"1640726020000\"},\"value\":{\"valueType\":\"DoubleType\",\"doubleValue\":0.065927795},\"unit\":\"1\"},{\"metricName\":\"http_response_code\",\"sampleType\":\"Value\",\"interval\":{\"endTime\":\"1640726020000\"},\"value\":{\"valueType\":\"IntegerType\",\"integerValue\":301},\"unit\":\"1\"},{\"metricName\":\"content_length\",\"sampleType\":\"Value\",\"interval\":{\"endTime\":\"1640726020000\"},\"value\":{\"valueType\":\"IntegerType\",\"integerValue\":162},\"unit\":\"1\"}
Parsing this, you can see that the GroundWork Output plugin applies defaults, namely "telegraf" for the hostname ( \"name\":\"telegraf\",\"type\":\"host\") and "http_response" for the service name (\"name\":\"http_response\",\"type\":\"service\"). The status is "SERVICE_OK", which is correct. Actually, these are default default values. The host name is "telegraf" because the resource_tag (set above to "resource") isn't set, and since there's no tag with key "status", which is what GroundWork uses to assign state, the service is "SERVICE_OK", again a configurable default value.
Obviously, this could be better. We should be able to create hosts based on what we monitor in the url list. We should be able to assign state, as well, based on the result of the http status codes, or a failure of DNS to find the host at all. We could even set thresholds to alarm on the metrics values in the Status Summary using the Edit - Thresholds feature.
These improvements require a "processor" plugin, or more than one. Here's an example: http_tests.conf.example
This file is commented as to what it does, so we won't go through it here. To use it, though we should make a few more changes:
Make a telegraf.d subdirectory under ./config:
mkdir ./config/telegraf.d
CODESimply place the https_tests.conf file in the gw8/config/telegraf.d directory. Make sure to change the filename to end in .conf, so it will be read after the next step.
Edit the docker-compose.override.yml file again, and add a command override:
telegraf: image: telegraf:1.21.2 volumes: - ${PWD}/config/:/etc/telegraf/ command: "--config /etc/telegraf/telegraf.conf --config-directory /etc/telegraf/telegraf.d"
Any time you make a change to the Telegraf configuration, you will need to restart the Telegraf plugin. Do this now:
From the gw8 directory, type:
docker-compose kill telegraf docker-compose up -d
CODEReviewing the logs now, we see that more information is getting into our tags:
telegraf_1 | http_response_content_length{device="",message="The http response was a success and got a result of 301.",method="GET",resource="gwos.com",result="success",result_type="success",server="https://gwos.com",service="web_site_availability",status="SERVICE_WARNING",status_code="301"} 162 1640727630000
In other words, we now have values in our tag keys:
- message="The http response was a success and got a result of 301."
- resource="gwos.com"
- service="web_site_availability"
- status="SERVICE_WARNING"
These new tags are created by the processor plugins as configured in http_tests.conf, to make the hostname (resource) out of part of the URL with a regex, the message (status text) and service (service description) as templated tags, and the status as the result of an enumeration (enum) on the result code. These tags are what GroundWork expects, and so they map perfectly into a host and service with multiple metrics. We can see the results in Status Summary:
In summary, this advanced method of monitoring allows us to map literally hundreds of plugins and thousands of metrics into hosts and services in GroundWork, as well as assign status, custom service descriptions, status text and thresholds. Mastery of the Telegraf configuration files is not trivial but it is very rewarding in terms of your ability to match the data gathering capabilities of Telegraf with GroundWork automated graphing, dashboards, notification and reporting capabilities.
Setting up Telegraf to Process SNMP Traps
The ability of Telegraf to process SNMP Traps has been a focus of development at GroundWork for some time. With contributions from many authors, we now have a robust way of listening for, and processing, with a high degree of flexibility, virtually any SNMP Trap. Of course, the type of traps you send, where you send them from (the devices themselves or a manager), the MIBs you use, etc. will all vary, and so you should be prepared to add to the examples here as you go. The following instructions tell you the basics, and give you a good start on using this capability.
Starting with the example above configured and working, in the gw8 directory, edit the docker-compose.override.yml file. Add the mibs volume and UDP port:
telegraf: image: telegraf:1.21.2 command: "--config /etc/telegraf/telegraf.conf --config-directory /etc/telegraf/telegraf.d" volumes: - ${PWD}/config/:/etc/telegraf/ - mibs:/usr/share/snmp/mibs ports: - "162:162/udp" # snmp trap port
CODEalso, in the volumes: section, add the mibs: volume:
volumes: # Uncomment to enable tracing # jaegertracing: mibs:
CODERestart Telegraf:
docker-compose kill telegraf docker-compose up -d
CODEVerify that the port is open:
dockergw8_telegraf_1 /docker_cmd.sh Up 0.0.0.0:162->162/udp
If you do not see 162/udp open and listening, it's likely that you will need to adjust the firewall settings. Unless you are otherwise blocking 162/udp, this may work:
docker-compose down sudo service docker restart sudo service docker start docker-compose up -d docker-compose restart telegraf
CODEBe advised that this process requires sudo (root) access, and that monitoring will be paused during the restart.
Add MIB files to the mibs: volume. The mibs you add are up to you, but at a minimum you will need the if-mib (for interface status) to use this example. There are many trap mibs available for download from the device providers, and GroundWork does not distribute these, but all are handled in the same way. Place the desired MIB (or, preferably a consistent set of MIB files) on the GroundWork server, and type, for example:
docker cp IF-MIB.txt dockergw8_telegraf_1:/usr/share/snmp/mibs/IF-MIB.txt
If you have a number of MIBs to transfer, you may want to zip them up first and transfer them as a single tar file, for example to transfer mibs.tar.gz:
docker cp mibs.tar.gz dockergw8_telegraf_1:/usr/share/snmp/mibs/mibs.tar.gz docker-compose exec telegraf bash cd /usr/share/snmp/mibs/ tar zxvf mibs.tar.gz rm mibs.tar.gz exit
Add the snmp input plugin definition to the gw8/config/telegraf.conf file:
# # Receive SNMP traps [[inputs.snmp_trap]] # You need to tell the input where the mibs are path = ["/usr/share/snmp/mibs"] # Some tags we don't need: tagexclude = ["community", "oid", "version", "host"] # Filter by source IP if you want to # [inputs.snmp_trap.tagpass] # source = ["192.168*"]
CODENote the path directive to point to the MIB location. This was added as of a recent change to the snmp trap input plugin, and is needed. The tag exclude and tagpass filter are optional.
- Add the required processor plugin configuration files to gw8/config/config.d as ".conf" files. Here are some useful examples:
- snmp_trap_ifmib.conf - A simple example of processing linkDown and linkUp traps
- snmp_trap_hitachi-dfraid.conf - An example of setting service states based on the status of a Hitachi DFRAID™ device
- snmp_trap_cyberarc.conf - An example of processing traps from Cyberarc™ software
- snmp_trap_dollaru.conf - An example of processing traps from Dollar Universe™ software
Many more such configurations are possible.
Configuration Notes
Delete the cruft
In GroundWork Monitor 8.2.1 and above, it is trivial to remove hosts and services that are no longer getting updated. This integration has the capacity to add a lot of resources, so don't be shy about removing them. In the Status Summary, you can do this with the Delete Selected action on the Search/Actions submenu:
Turn off debug when done
In general you won't want to have the debug level on in production, and you may even want to turn off the "file" output plugin, especially if you frequently poll a lot of metrics.
Beware of overload
As with any monitoring method, just because you can monitor something, it doesn't mean you should. If you see a plugin coming back with a lot more metrics than you need, use a processor plugin to drop the extras.
Filter out the infiltrators
You can filter incoming traps or logs, and add more conditions based on any of the input tags, like mib, name, etc. You can also use lists of values, e.g.,
# Filter by source IP [inputs.snmp_trap.tagpass] source = ["192.168*", "172.21.12*"] mib = ["IF-MIB"]
Related Resources
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page:
-
Page: