Create separate measurement tables in InfluxDB for Icinga 2 NRPE checks

Written by - 3 comments

Published on - last updated on February 20th 2023 - Listed in Icinga Monitoring Influx Database


In a previous article I wrote how Icinga 2 performance graphs can be created using InfluxDB and Grafana. At the end of the article I mentioned a special note concerning NRPE checks:

Note: For NRPE checks you will have to adapt the graphs because this performance data is stored in the "nrpe" measurement table.

My monitoring architecture relys heavily on remotely executed checks using check_nrpe therefore almost all system related information (cpu, memory, network io, diskspace, etc) were collected in one and the same measurement table: nrpe:

root@inf-mon02-t:~# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.10.0
InfluxDB shell 0.10.0
> USE icinga2
Using database icinga2
> SHOW MEASUREMENTS
name: measurements
------------------
name
apt
disk
hostalive
http
icinga
load
ping4
ping6
procs
ssh
swap
users

At the begin of this year, in January 2017, I had some problems with PNP4Nagios and NRPE checks. I was unable to control the graph's behavior on certain remotely executed checks, because PNP4Nagios interpreted all the checks as the same plugin: check_nrpe. With a workaround (applying a special variable containing the NRPE check command) I was able to create separate PNP4Nagios templates for each individual remote NRPE check command (see article Creating custom PNP4Nagios template in Icinga 2 for NRPE checks for more details).
Where am I going with this? The same workaround can also be applied to the InfluxdbWriter object!

Fist I modified the apply rule which added the remote disk usage checks (you guessed it, using check_nrpe) on the Linux hosts:

apply Service "Diskspace " for (partition_name => config in host.vars.partitions) {
  import "generic-service"

  vars += config
  if (!vars.warn) { vars.warn = "15%" }
  if (!vars.crit) { vars.crit = "5%" }
  if (!vars.iwarn) { vars.iwarn = "15%" }
  if (!vars.icrit) { vars.icrit = "5%" }
  if (!vars.service) { vars.service = "generic-service" }

  import vars.service

  display_name = "Diskspace " + partition_name
  check_command = "nrpe"
  vars.nrpe_command = "check_disk"
  vars.nrpe_arguments = [ vars.warn, vars.crit, partition_name, vars.iwarn, vars.icrit ]
  vars.influx_append = "_$nrpe_command$"

  assign where host.address && host.vars.os == "Linux"
  ignore where host.vars.applyignore.partitions == true
}

Note: For more information about such advanced Icinga2 configurations using apply rules, take a look at Icinga 2: Advanced usage of arrays/dictionaries for monitoring of partition.

Take a look at the following line:

  vars.influx_append = "_$nrpe_command$"

Here I define a new variable influx_append. It is a string starting with an underscore (_) followed by the value of the variable nrpe_command. Which is actually check_disk as you can see two lines above it. Whenever this applied disk usage check is running, the service object now also contains the variable influx_append. This can now be used in the InfluxdbWriter.

The InfluxdbWriter feature object needs to be modified in a way, that the measurement table to use/create contains the value of the influx_append variable. And this is how I've done it:

root@inf-mon02-t:~# cat /etc/icinga2/features-enabled/influxdb.conf
/**
 * The InfluxdbWriter type writes check result metrics and
 * performance data to an InfluxDB HTTP API
 */

library "perfdata"

object InfluxdbWriter "influxdb" {
  //host = "127.0.0.1"
  //port = 8086
  //database = "icinga2"
  //flush_threshold = 1024
  //flush_interval = 10s
  //host_template = {
  //  measurement = "$host.check_command$"
  //  tags = {
  //    hostname = "$host.name$"
  //  }
  //}
  service_template = {
  //  measurement = "$service.check_command$"
    measurement = "$service.check_command$$influx_append$"
    tags = {
      hostname = "$host.name$"
      service = "$service.name$"
    }
  }
}

As you can see if kept the defaults, but un-commented the service_template part. The original measurement definition is still there (commented). I slightly modified it:

    measurement = "$service.check_command$$influx_append$"

So the measurement table to be used is now appended with new content. The nice thing is: This doesn't change anything for the local executed checks like http or ldap, because the variable influx_append is empty unless it comes from the NRPE disk usage check. On the other hand, as soon as a disk usage check through check_nrpe was executed, the variable contains information and appends the measurement like this: measurement = nrpe_check_disk .

After a restart of Icinga 2, the following can be seen in the debug logs (you must enable debug level in /etc/icinga2/features-enabled/mainlog.conf):

[2017-12-12 14:13:16 +0100] debug/InfluxdbWriter: Add to metric list: 'nrpe_check_disk,hostname=remoteserver01,service=Diskspace\ /var,metric=/var value=387973120 1513084396'.

Inside the InfluxDB this can be verified now:


root@inf-mon02-t:~# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.10.0
InfluxDB shell 0.10.0
> use icinga2
Using database icinga2
> show measurements
name: measurements
------------------
name
apt
disk
dns
hostalive
http
icinga
ldap
load
nrpe
nrpe_check_disk
ping4
ping6
procs
ssh
swap
users

Indeed, the measurement table nrpe_check_disk was created! Let's check the content:

> select * from nrpe_check_disk
name: nrpe_check_disk
---------------------
time            hostname        metric  service         value
1513084394000000000     remoteserver01    /var    Diskspace /var  3.9845888e+08
1513084395000000000     remoteserver02    /       Diskspace /     2.524971008e+09
1513084395000000000     remoteserver01    /tmp    Diskspace /tmp  1.048576e+06
1513084396000000000     remoteserver02    /var    Diskspace /var  3.8797312e+08
1513084396000000000     remoteserver02    /tmp    Diskspace /tmp  1.048576e+06
1513084451000000000     remoteserver01    /var    Diskspace /var  3.9845888e+08
1513084452000000000     remoteserver02    /       Diskspace /     2.524971008e+09
1513084452000000000     remoteserver01    /tmp    Diskspace /tmp  1.048576e+06
1513084454000000000     remoteserver02    /tmp    Diskspace /tmp  1.048576e+06
1513084454000000000     remoteserver02    /var    Diskspace /var  3.8797312e+08
1513084508000000000     remoteserver01    /var    Diskspace /var  3.9845888e+08
1513084510000000000     remoteserver02    /       Diskspace /     2.524971008e+09
1513084510000000000     remoteserver01    /tmp    Diskspace /tmp  1.048576e+06
1513084512000000000     remoteserver02    /var    Diskspace /var  3.8797312e+08
1513084512000000000     remoteserver02    /tmp    Diskspace /tmp  1.048576e+06

Success! Now I have my own measurement table for this type of remote check. This makes it easier for queries instead of having all the remote nrpe checks in one measurement table.

Update July 30th 2018

As you can see below in the comments, after the "influx_append" was added into the InfluxDB feature config, Icinga 2 writes a lot of warnings into /var/log/icinga2/icinga2.log like these:

[2018-07-30 09:45:57 +0200] warning/MacroProcessor: Macro 'influx_append' is not defined.
        (0) Resolving macros for string '$service.check_command$$influx_append$'

This happened to all the service checks which don't have a special variable "influx_append" defined, for example "http" or "ssh". I tired to define a global value "influx_append" in constants.conf (see my comment) but this didn't work.

However when I defined a service variable (vars.influx_append) in the base service template, which is inherited by all other services, and set it to empty, all the warnings were gone (because it is now a defined variable).

Basically all my services are using different templates (based on check times, criticality, etc). But all the different templates have one thing in common: A base template. And in this base template I defined this variable:

# cat /etc/icinga2/zones.d/global-templates/templates/service-base-template.conf
################################################################
# SERVICE TEMPLATE DEFINITIONS
################################################################
# service-base
# This service template is being inherited by other service templates
# Use it for settings which apply on ALL services
#################################
template Service "service-base" {
        notes_url = "/pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$"
        check_period = "24x7"
        vars.influx_append = ""
}

As this is the lowest definition of a service, the variable will be overwritten by a service later (see above's rule for the "apply Service Diskspace").


Add a comment

Show form to leave a comment

Comments (newest first)

Steve from wrote on Feb 18th, 2020:

Another very simple solution is the following, then you have the services separated by name:

service_template = {
measurement = "$service.name$"
tags = {
hostname = "$host.name$"
service = "$service.name$"
}
}


ck from Switzerland wrote on Jul 17th, 2018:

VerboEse, I have the same problem here with the warnings on all other services. I did not fix it yet but an idea would be to define influx_append on top level. Maybe in constants, but not sure if that works. Otherwise on all other services (influx_append = "") but that's kind of overkill, I agree.


VerboEse from wrote on Jul 16th, 2018:

Hi.
The idea is great! There is one problem though: for services not done via nrpe I get errors in my icinga log:
---
[2018-07-16 18:44:30 +0200] warning/MacroProcessor: Macro 'influx_append' is not defined.
Context:
(0) Resolving macros for string '$service.check_command$$influx_append$'
(1) Processing check result for 'icinga.mydomain.cxm!cluster'
---
I don't understand the definition enough for getting rid of these.


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder