During a routine check on an InfluxDB, a lot of weird characters showed up in the list of measurements (show measurements):
As this is the database used for monitoring performance data from Icinga, the measurement names should only contain executed check names, such as "nrpe", "http" or a mix of it (e.g. "check_nrpe_netio", see create separate measurement tables in InfluxDB for Icinga 2 NRPE checks how to do this). Yet the output clearly shows some form of corrupted data being returned from InfluxDB.
The same corruption and weird characters also show up when listing all the series in this database (using show series):
A post in the InfluxDB forums goes into the same direction, where corrupt measurements were discovered. User LeJav responded to delete these corrupted series and measurements entries.
This kind of worked for the series entries by using a special regular expression /^[^a-z]/:
These series can be dropped and they are gone afterwards:
root@inf-monix01-p:~# echo "drop series from /^[^a-z]/" |influx -username admin -password secret -database icinga
root@inf-monix01-p:~# echo "show series from /^[^a-z]/" |influx -username admin -password secret -database icinga
root@inf-monix01-p:~#
However there are still certain series which remain corrupt, but started with an alphabetic character, for example:
But for the measurements the story is different, as drop measurement does not support a from statement. But even by trying to put all this into a while loop, this failed in the terminal due to the corrupt character encoding:
A potential workaround would be to not use the influx command, but to use the HTTP API instead. By listing the measurements using the InfluxDB API, the measurement names appear different, yet still contain illegal or corrupt characters:
root@inf-monix01-p:~# curl -s "http://admin:secret@localhost:8086/query?db=icinga&q=show%20measurements" | jq
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "measurements",
"columns": [
"name"
],
"values": [
[
"check_nrpe2_check_mem"
],
[
"check_nrpe2_check_netio"
],
[
"check_nwc_health"
],
[
"check_postgres"
],
[
"check_rabbitmq_connections"
],
[
"check_rancher2"
],
[...]
[
"enste"
],
[
"esxi_hardware"
],
[
"ftp"
],
[
"h\u0001\u0000\\*\u007f\u0000\u0000`\u0001\u0003\\*\u007f\u0000\u0000ostname=et-radoi05-p"
],
[
"h\u0002\u0000\\*\u007f\u0000\u0000*X\u0000\\*\u007f\u0000\u0000name=nzzonl-mysql01-prod"
],
[
"h\u0004\u0000P*\u007f\u0000\u0000**\u0005P*\u007f\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000studio-sw-02"
],
[
"h\u0004\u0000\\*\u007f\u0000\u0000P*\u000b\\*\u007f\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000ev115a-1"
],
[
"hostalive"
],
[...]
[
"vmware-esx-soap-host-io-write-latency"
],
[
"vmware-esx-soap-host-mem-usage"
],
[
"vmware-esx-soap-host-net-usage"
]
]
}
]
}
]
}
I replaced all illegal characters by an asterisk (*) - or my own database storing this article would run into problems. But the corrupt measurements clearly stand out from the output.
As long as dealing with corrupt characters on the command line, a different solution must be found.
Taking a closer look at the InfluxDB documentation, one comes across the influx_inspect command. This command uses a couple of sub-commands to execute specific tasks. One sub-task (deletetsm) caught my eye:
deletetsm
Use deletetsm -measurement to delete a measurement in a raw TSM file (from specified shards). Use deletetsm -sanitize to remove all tag and field keys containing non-printable Unicode characters in a raw TSM file (from specified shards).
Non-printable Unicode characters? This sounds familiar!
Before influx_inspect can run on InfluxDB, the service needs to be stopped first:
root@inf-monix01-p:~# systemctl stop influxd
The command can now be run, but it needs to be run against the tsm files directly. Luckily the path input (the last input of the command) supports a wildcard:
root@inf-monix01-p:~# influx_inspect deletetsm -v -sanitize /var/lib/influxdb/data/icinga/autogen/*/*.tsm
2021/10/08 08:42:23 processing: /var/lib/influxdb/data/icinga/autogen/1005/000002404-000000003.tsm
2021/10/08 08:42:27 processing: /var/lib/influxdb/data/icinga/autogen/1014/000002395-000000003.tsm
2021/10/08 08:42:37 processing: /var/lib/influxdb/data/icinga/autogen/1023/000002385-000000003.tsm
2021/10/08 08:42:48 processing: /var/lib/influxdb/data/icinga/autogen/1032/000002394-000000003.tsm
2021/10/08 08:42:58 processing: /var/lib/influxdb/data/icinga/autogen/1041/000002406-000000003.tsm
2021/10/08 08:43:09 processing: /var/lib/influxdb/data/icinga/autogen/1050/000002403-000000003.tsm
2021/10/08 08:43:20 processing: /var/lib/influxdb/data/icinga/autogen/105/000001428-000000003.tsm
2021/10/08 08:43:28 processing: /var/lib/influxdb/data/icinga/autogen/1059/000002367-000000003.tsm
2021/10/08 08:43:37 processing: /var/lib/influxdb/data/icinga/autogen/1068/000002351-000000003.tsm
[...]
InfluxDB now starts to inspect all the tsm files, eventually leading to the corrupt blocks and deleting them:
Once this command finishes, InfluxDB can be started again:
root@inf-monix01-p:~# systemctl start influxdb
Let's check if there are still any corrupt measurements around:
root@inf-monix01-p:~# echo "show measurements" | influx -username admin -password secret -database icinga
name: measurements
name
----
check_apachestatus
check_disk_smb
check_elasticsearch
check_epaper_download
check_epaper_download_aaz
check_es_system
check_fast_lta
check_haproxy
check_http_phantom
check_ibm_hardware
check_infoblox
check_mssql
check_mysql
check_mysql_slavestatus
check_netapp_ng
check_netapp_ontap
check_nginx_status
check_nrpe2
check_nrpe2_check_apt
check_nrpe2_check_cpu_stats
check_nrpe2_check_disk
check_nrpe2_check_diskio_device
check_nrpe2_check_load
check_nrpe2_check_lxc
check_nrpe2_check_mailq
check_nrpe2_check_mem
check_nrpe2_check_netio
check_nrpe2_check_procs_total
check_nrpe2_check_smart
check_nrpe2_check_swap
check_nrpe2_check_users
check_nrpe2_check_zypper
check_nwc_health
check_postgres
check_rabbitmq_connections
check_rancher2
check_rbl
check_redis
check_sap
check_ssl_cert
check_ucs_path
check_vmware_snapshots
check_vsx
check_win_disk_usage
check_win_net_usage
cluster
dns
enste
esxi_hardware
ftp
hostalive
http
icinga
inx_status
ldap
logstash
mongodb
netscaler
nrpe
nrpe_check_apt
nrpe_check_cpu_stats
nrpe_check_disk
nrpe_check_diskio_device
nrpe_check_drivesize
nrpe_check_files
nrpe_check_load
nrpe_check_lxc
nrpe_check_mailq
nrpe_check_mem
nrpe_check_netio
nrpe_check_postgres
nrpe_check_procs_total
nrpe_check_smart
nrpe_check_swap
nrpe_check_tcp
nrpe_check_users
nrpe_check_varnish
nrpe_check_varnish_name
nrpe_check_zypper
nscp
nscp_CPULOAD
nscp_MEMUSE
nscp_USEDDISKSPACE
ping
rvice=Network IO docker0
smtp
ssh
tcp
vmware-esx-dc-volumes
vmware-esx-soap-host-cpu-usage
vmware-esx-soap-host-io
vmware-esx-soap-host-io-read-latency
vmware-esx-soap-host-io-write-latency
vmware-esx-soap-host-mem-usage
vmware-esx-soap-host-net-usage
root@inf-monix01-p:~#
All (corrupt) measurements with a weird character are now gone! The same was also checked on the series (show series) and they all are OK, too!
Alexander Reichle-Schmehl from wrote on Jul 15th, 2022:
Thanks for your hint!
Based upon your ideas, we basically did:
echo "drop series from /^[^a-z]/" |influx -database icinga2
echo "drop series from /^[a-z][^a-z]/" |influx -database icinga2
echo "drop series from /^[a-z][a-z][^a-z_]/" |influx -database icinga2
and that seems to have cleaned up our influx database!
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder