Automatic SLA reporting from Icinga and push into Confluence page

Written by - 2 comments

Published on - last updated on March 6th 2020 - Listed in Icinga Monitoring Nagios Linux Atlassian


Back in 2010 I created automatic availability reporting from a Nagios installation (see How to create automatic PDF from Nagios Availability Reports?). The idea was pretty simple: In a monthly interval (generally running on the 1st of the moth) simply create a PDF from the availability report (using the previous month's data) and send it by mail.

Methods (and knowledge) have changed since and I was asked to create an automatic SLA reporting using the statistics from Icinga 2.Initially sending the report by e-mail would have been enough, but when I came across Confluence's REST API, the goal was to directly add the reporting into Confluence.

Note: Icinga's new interface icingaweb2 does not support availability reporting as of April 2018. We're still using icinga2-classicui for this purpose.

The script I created is split into several steps. Let's go through them.

Step one: Define your (more or less fixed) base variables

At the begin of the script, I defined some base variables which will later be used in the script.

# Basic variable definitions
yearlastmonth=$(dateutils.dadd today -1mo -f '%Y')
monthlastmonth=$(dateutils.dadd today -1mo -f '%m')
availurl="http://icinga.example.com/cgi-bin/icinga2-classicui/avail.cgi"
icingauser="icingaadmin"
icingapass="password"
wikiuser="slareporter"
wikipass="dd2ddAADw2"

You might have noticed that I'm using dateutils.dadd instead of date to determine the year and month of the previous month. Take a look at my article "Bash: date -d 1 month ago still shows the same month" to see why.
The availurl variable contains the address to your Nagios' or Icinga's avail.cgi.
The two credentials are used to login on avail.cgi and on the Confluence Wiki.

Step two: Create a PDF from the availablity report

By using wkhtmltopdf, the availability report seen in the Nagios or Icinga2-ClassicUI interface, can be saved as a PDF (including formatting, colors, etc). The command is pretty simple:

xvfb-run -a -s "-screen 0 640x480x16" wkhtmltopdf --username $icingauser --password $icingapass "${availurl}?show_log_entries=&host=internet&service=HTTP+www.example.com&timeperiod=lastmonth" /tmp/${yearlastmonth}${monthlastmonth}-www.example.com.pdf

xvfb-run is used to run wkhtmltopdf in a non-interactive way. Otherwise wkhtmltopdf would complain about a missing X display.

Of course the important parameters in the requested URL are: host=internet (which is the host object), service=HTTP+www.example.com (the service object we want the report from) and timeperiod=lastmonth (get the statistics for the previous month).

Because now is April 2018, the PDF document is saved as /tmp/2018-03-www.example.com.pdf. 

Step three: Upload the PDF to the relevant Confluence page

The upload of a file/attachment is pretty easy, compared to changing the content of a page (more on that later):

curl -s -S -u "${wikiuser}:${wikipass}" -X POST -H "X-Atlassian-Token: no-check" -F "file=@/tmp/${yearlastmonth}${monthlastmonth}-www.example.com.pdf" -F "comment=${yearlastmonth}-${monthlastmonth} www.example.com" "https://wiki.example.com/confluence/rest/api/content/12345678/child/attachment"| python -mjson.tool

Obviously the generated PDF is uploaded using -F "file=@/tmp/${yearlastmonth}${monthlastmonth}-www.example.com.pdf". 
Don't forget to adjust the Confluence host address (here wiki.example.com) and the page ID (here 12345678). You can find the page ID either in the address or in the "Page information" of the relevant page.

After successful upload, the PDF will appear as attachment on that Confluence page.

Step four: Get the availability percentage

As you might know, the availability report (we created the report pdf of) can also be displayed in another format: csv, json, xml.
With a json parser like jshon, the value of the field "percent_known_time_ok" (inside the "services" array) can be directly shown:

availpercent=$(curl -s -u "${icingauser}:${icingapass}" "${availurl}?show_log_entries=&hostservice=internet^HTTP+www.example.com&assumeinitialstates=yes&assumestateretention=yes&assumestatesduringnotrunning=yes&includesoftstates=no&initialassumedhoststate=0&initialassumedservicestate=0&timeperiod=lastmonth&backtrack=8&jsonoutput" | jshon -e avail -e service_availability -e services -a -e percent_known_time_ok | awk '{printf("%.3f\n", $1)}')

If you want to see the structure of the json output, simply click on the "export to json" button in the user interface.
I'm using the awk command in the end to get a maximum of 3 decimals for the value. E.g. 99.335654 will be cut to 99.335.

But under some circumstances it's possible that the json output cannot be handled by jshon ("too big integer"). This happened on a service where at the begin of the month recurring downtimes were set, but at the end of the month they were removed. This caused a miscalculation in the report and created a huge field number ("time_critical_unscheduled": 18446744073709548794). I opened an issue on the Github project for jshon to address this. In the meantime I created the following workaround:

# In some cases, we could hit a json parsing error due to a too big integer. In such a case we try the csv output.
if [[ $? -gt 0 ]] || [[ -z $availpercent ]]
  then availpercent=$(curl -s -u "${icingauser}:${icingapass}" "${availurl}?show_log_entries=&hostservice=internet^HTTP+www.example.com&timeperiod=lastmonth&rpttimeperiod=24x7&assumeinitialstates=yes&assumestateretention=yes&assumestatesduringnotrunning=yes&includesoftstates=no&initialassumedservicestate=6&backtrack=8&content_type=csv" | grep "internet" | awk -F';' '{print $11}' | sed "s/'//g" | sed "s/%//g")
fi

In case the pevious command failed or the previously defined variable $availpercent is empty, the csv output of the same service will be accessed. The parsing is of course different; here I'm interested in the 11th column (which is percent_known_time_ok).

Step five: Retrieve the Confluence page's information and content

Here's a very important information: If you want to change the content of a Confluence page, you need to:

  • Retrieve the full content (body.storage)
  • Retrieve the current page version number and other information
  • Change the full content by adding your changes
  • Increase the version number
  • Submit the full content (old + your change), including the new version number, page ID, space ID and page title

Let's do this slowly:

# Get current version number and content from wiki page
wikiversion=$(curl -s -u "${wikiuser}:${wikipass}" "https://wiki.example.com/confluence/rest/api/content/12345678?expand=version" | python -mjson.tool | jshon -e version -e number)
wikicontent=$(curl -s -u "${wikiuser}:${wikipass}" "https://
wiki.example.com/confluence/rest/api/content/12345678?expand=body.storage" | python -mjson.tool | jshon -e body -e storage -e value)

Here again I'm using jshon to get the values of the fields and save them into variables "wikiversion" and "wikicontent".

Note: The value saved in $wikiversion is a number, the value in $wikicontent is a string already containing doublequotes.

Step six: Make the changes

The Wiki page I prepared simply contained a table and I wanted to add a new row for the previous month at the end of the table.

Icinga SLA reporting into Confluence

This means I have to add the new row right before the code marking the end of the table. I chose sed for this:

# Change content (add new row at bottom of table)
newcontent=$(echo $wikicontent | sed "s#<\\\\/tbody>#Website www.example.com<\\\/td>https://www.example.com<\\\/td>${yearlastmonth}-${monthlastmonth}<\\\/td>${availpercent}%<\\\/td><\\\/tr><\\\/tbody>#")

Note the crazy amount of backslashes. This is because the escaped backslashes need to remain in the final submit (compare with the value of $wikicontent).

We only need to increase the version number of the Wiki page:

# Increment version number
newversion=$(( $wikiversion + 1 ))

Step seven: Upload the changes

OK, now we're finally ready to upload the change to Confluence:

# Update Wiki page
curl -s -u "${wikiuser}:${wikipass}" -X PUT -H 'Content-Type: application/json' -d "{\"id\":\"12345678\",\"type\":\"page\",\"title\":\"SLA Reporting www.example.com\",\"space\":{\"key\":\"SysServices\"},\"body\":{\"storage\":{\"value\":$newcontent,\"representation\":\"storage\"}},\"version\":{\"number\":$newversion}}" https://wiki.example.com/confluence/rest/api/content/12345678 | python -mjson.tool

Note that $newcontent was not put into additional double-quotes. As mentioned before, the original value ($wikicontent) already starts and ends with double-quotes.
$newversion was also not put into (double-) quotes because it's a number, not a string.

Step eight (final step): Automate it

I went one step further and instead of having a huge script with hundreds of lines for each service we want SLA reporting, I added some parameters at the begin:

# Get user-given variables (dynamic)
while getopts "T:U:W:P:H:S:" Input;
do
       case ${Input} in
       T)      title=${OPTARG};;
       U)      url=${OPTARG};;
       W)      wikiid=${OPTARG};;
       P)      wikipagetitle=${OPTARG};;
       H)      icingahost=${OPTARG};;
       S)      icingaservice=${OPTARG};;
       *)      echo "Wrong option given."
               exit 1
               ;;
       esac
done

# Before we do anything, check if we have all information
if [[ -z $title ]]; then echo "Missing title, use -T"; exit 1
elif [[ -z $url ]]; then echo "Missing URL, use -U"; exit 1
elif [[ -z $wikiid ]]; then echo "Missing Wiki page ID, use -W"; exit 1
elif [[ -z $wikipagetitle ]]; then echo "Missing Wiki page title, use -P"; exit 1
elif [[ -z $icingahost ]]; then echo "Missing Icinga host name of this SLA, use -H"; exit 1
elif [[ -z $icingaservice ]]; then echo "Missing Icinga service name of this SLA, use -S"; exit 1
fi

This way I can launch the script for many services, each with its own Wiki page (if necessary):

# crontab -l
# SLA Reportings
00 02 1 * * /root/scripts/icinga-sla-reporting.sh -T "Website www.example.com" -U "www.example.com" -W 12345678 -P "SLA Reporting www.example.com" -H internet -S "HTTP+www.example.com" >/dev/null
01 02 1 * * /root/scripts/icinga-sla-reporting.sh -T "Rest API api.example.com" -U "api.example.com" -W 12312399 -P "SLA Reporting api.example.com" -H internet -S "HTTP+api.example.com" >/dev/null

Update March 6, 2020: Using Thruk now

Unfortunately recent Icinga 2 updates removed the icinga2-classicui interface (see Icinga2-classicui is gone after installing Icinga 2 2.10) and therefore also the cgi scripts needed to create the SLA statistics. As a workaround I have adjusted the script to use Thruk's SLA reporting plugin.

There is also (slow) work in progress on a reporting module for icingaweb2. It is generally available, however it lacks depth compared to Thruk's reporting module.


Add a comment

Show form to leave a comment

Comments (newest first)

Claudio from Switzerland wrote on May 31st, 2018:

Ole, I can confirm your findings. It seems there is a problem in the calculation (no log entries are shown for past data) in Icinga 2.8. It works fine in Icinga 2.6. I also commented in the Github issue you opened: https://github.com/Icinga/icinga2/issues/6346/.


Ole from Norway wrote on May 31st, 2018:

Lovely script - but we do have problems with various variables.

We can't get avail.cgi to report on lastmonth, lastweek, yesterday, but it works for thismonth, last7days and last24hours.

Looks like a bug - its the same in the Classic-UI web interface.

https://github.com/Icinga/icinga2/issues/6346

Any ideas?

Thanks
Ole


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder