It has been very quiet around check_couchdb_replication, a monitoring plugin to monitor CouchDB (synchronous) replications, in the past years.
But now there's a new update on the plugin and a new version is available! A couple of recent changes made it into the latest release (20220223) which include:
CouchDB 3 added additional information to the JSON output of a replication. One of this additional information is the "missing_revisions_found" counter. Because the plugin was "used to CouchDB 2" in 2018, the "missing" string was interpreted as replication not found. This resulted in the following error, even for existing replications:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r e9046b127bd99afc9cd208b94d6ea136
COUCHDB REPLICATION CRITICAL - Replication for e9046b127bd99afc9cd208b94d6ea136 not found
The new plugin release fixes this:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r e9046b127bd99afc9cd208b94d6ea136
COUCHDB REPLICATION OK - Replication e9046b127bd99afc9cd208b94d6ea136 is running
Thanks to Guillaume Subiron for creating the PR. In the newest version a slightly different approach ("error: not_found" instead of "reason: missing" ) is now used.
In the previous version from 2018, the replication detection only showed the doc_id of the replication:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -d
COUCHDB AVAILABLE REPLICATIONS: "e9046b127bd99afc9cd208b94d1ca1b6" "e9046b127bd99afc9cd208b94d6ea136" "e9046b127bd99afc9cd208b94d1cc8e2" "e9046b127bd99afc9cd208b94d1c081a" "e9046b127bd99afc9cd208b94d6e9b59"
This information is enough to run a check on a single replication - but it does not show to the human eye what replication is behind the doc_id.
In the latest release the output now also shows the replication source (which includes the database) in the parentheses following the doc_id:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -d
COUCHDB AVAILABLE REPLICATIONS: e9046b127bd99afc9cd208b94d1ca1b6 (http://couchdb1.example.com:5984/_users/) e9046b127bd99afc9cd208b94d6ea136 (http://couchdb1.example.com:5984/marketingtools/) e9046b127bd99afc9cd208b94d1cc8e2 (http://couchdb1.example.com:5984/councillors-ch-dev/) e9046b127bd99afc9cd208b94d1c081a (http://localhost:5984/q-items-dev/) e9046b127bd99afc9cd208b94d6e9b59 (http://localhost:5984/marketingtools/)
Although this is just a minor change on the output itself, it is a helpful new feature for users creating graphs from performance data. This can show over a long period the number of replications.
The old version did not show the number of replications:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All replications running
The new version shows the number of successfully running replications in the output and in performance data:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All 38 continuous replications running | replok=38;;;; replfail=0;;;;
Another problem, which was described in issue #5, was that the plugin returned a CRITICAL alert on non-continuous (one time) replications. Of course these kind of replications only run once and then are marked as "completed" in their state - yet the plugin expected a "running" state:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION CRITICAL - 2 replications not running ("doc_id":"e9046b127bd99afc9cd208b94d5e4fcb" "error_count":0 "info":{"revisions_checked":899,"doc_id":"e9046b127bd99afc9cd208b94d5f4538" "error_count":0 "info":{"revisions_checked":899,)
To handle this properly, all the non-continuous (= one time) replications are now excluded from the "ALL" check by default:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All 38 continuous replications running | replok=38;;;; replfail=0;;;;
If the one time replications should be part of the check, they can be included again using the new -i parameter:
$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL -i
COUCHDB REPLICATION CRITICAL: 2 continuous replications not running - Details: e9046b127bd99afc9cd208b94d5e4fcb (state: completed, error count: 0) e9046b127bd99afc9cd208b94d5f4538 (state: completed, error count: 0) | replok=38;;;; replfail=2;;;;
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder