How to fix data corruption and strange characters in InfluxDB measurements and series names

Written by - 1 comments

Published on - Listed in Influx Database


During a routine check on an InfluxDB, a lot of weird characters showed up in the list of measurements (show measurements):

As this is the database used for monitoring performance data from Icinga, the measurement names should only contain executed check names, such as "nrpe", "http" or a mix of it (e.g. "check_nrpe_netio", see create separate measurement tables in InfluxDB for Icinga 2 NRPE checks how to do this). Yet the output clearly shows some form of corrupted data being returned from InfluxDB.

The same corruption and weird characters also show up when listing all the series in this database (using show series):

Trying to delete the corrupted entries

A post in the InfluxDB forums goes into the same direction, where corrupt measurements were discovered. User LeJav responded to delete these corrupted series and measurements entries.

This kind of worked for the series entries by using a special regular expression /^[^a-z]/:

These series can be dropped and they are gone afterwards:

root@inf-monix01-p:~# echo "drop series from /^[^a-z]/" |influx -username admin -password secret -database icinga
root@inf-monix01-p:~# echo "show series from /^[^a-z]/" |influx -username admin -password secret -database icinga
root@inf-monix01-p:~#

However there are still certain series which remain corrupt, but started with an alphabetic character, for example:

But for the measurements the story is different, as drop measurement does not support a from statement. But even by trying to put all this into a while loop, this failed in the terminal due to the corrupt character encoding:

A potential workaround would be to not use the influx command, but to use the HTTP API instead. By listing the measurements using the InfluxDB API, the measurement names appear different, yet still contain illegal or corrupt characters:

root@inf-monix01-p:~# curl -s "http://admin:secret@localhost:8086/query?db=icinga&q=show%20measurements" | jq
{
  "results": [
    {
      "statement_id": 0,
      "series": [
        {
          "name": "measurements",
          "columns": [
            "name"
          ],
          "values": [
            [
              "check_nrpe2_check_mem"
            ],
            [
              "check_nrpe2_check_netio"
            ],
            [
              "check_nwc_health"
            ],
            [
              "check_postgres"
            ],
            [
              "check_rabbitmq_connections"
            ],
            [
              "check_rancher2"
            ],
[...]
            [
              "enste"
            ],
            [
              "esxi_hardware"
            ],
            [
              "ftp"
            ],
            [
              "h\u0001\u0000\\*\u007f\u0000\u0000`\u0001\u0003\\*\u007f\u0000\u0000ostname=et-radoi05-p"
            ],
            [
              "h\u0002\u0000\\*\u007f\u0000\u0000*X\u0000\\*\u007f\u0000\u0000name=nzzonl-mysql01-prod"
            ],
            [
              "h\u0004\u0000P*\u007f\u0000\u0000**\u0005P*\u007f\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000studio-sw-02"
            ],
            [
              "h\u0004\u0000\\*\u007f\u0000\u0000P*\u000b\\*\u007f\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000ev115a-1"
            ],
            [
              "hostalive"
            ],
[...]
            [
              "vmware-esx-soap-host-io-write-latency"
            ],
            [
              "vmware-esx-soap-host-mem-usage"
            ],
            [
              "vmware-esx-soap-host-net-usage"
            ]
          ]
        }
      ]
    }
  ]
}

I replaced all illegal characters by an asterisk (*) - or my own database storing this article would run into problems. But the corrupt measurements clearly stand out from the output.

As long as dealing with corrupt characters on the command line, a different solution must be found.

influx_inspect to the rescue!

Taking a closer look at the InfluxDB documentation, one comes across the influx_inspect command. This command uses a couple of sub-commands to execute specific tasks. One sub-task (deletetsm) caught my eye:

deletetsm
Use deletetsm -measurement to delete a measurement in a raw TSM file (from specified shards). Use deletetsm -sanitize to remove all tag and field keys containing non-printable Unicode characters in a raw TSM file (from specified shards).

Non-printable Unicode characters? This sounds familiar!

Before influx_inspect can run on InfluxDB, the service needs to be stopped first:

root@inf-monix01-p:~# systemctl stop influxd

The command can now be run, but it needs to be run against the tsm files directly. Luckily the path input (the last input of the command) supports a wildcard:

root@inf-monix01-p:~# influx_inspect deletetsm -v -sanitize /var/lib/influxdb/data/icinga/autogen/*/*.tsm
2021/10/08 08:42:23 processing: /var/lib/influxdb/data/icinga/autogen/1005/000002404-000000003.tsm
2021/10/08 08:42:27 processing: /var/lib/influxdb/data/icinga/autogen/1014/000002395-000000003.tsm
2021/10/08 08:42:37 processing: /var/lib/influxdb/data/icinga/autogen/1023/000002385-000000003.tsm
2021/10/08 08:42:48 processing: /var/lib/influxdb/data/icinga/autogen/1032/000002394-000000003.tsm
2021/10/08 08:42:58 processing: /var/lib/influxdb/data/icinga/autogen/1041/000002406-000000003.tsm
2021/10/08 08:43:09 processing: /var/lib/influxdb/data/icinga/autogen/1050/000002403-000000003.tsm
2021/10/08 08:43:20 processing: /var/lib/influxdb/data/icinga/autogen/105/000001428-000000003.tsm
2021/10/08 08:43:28 processing: /var/lib/influxdb/data/icinga/autogen/1059/000002367-000000003.tsm
2021/10/08 08:43:37 processing: /var/lib/influxdb/data/icinga/autogen/1068/000002351-000000003.tsm
[...]

InfluxDB now starts to inspect all the tsm files, eventually leading to the corrupt blocks and deleting them:

Once this command finishes, InfluxDB can be started again:

root@inf-monix01-p:~# systemctl start influxdb

Let's check if there are still any corrupt measurements around:

root@inf-monix01-p:~# echo "show measurements" | influx -username admin -password secret -database icinga
name: measurements
name
----
check_apachestatus
check_disk_smb
check_elasticsearch
check_epaper_download
check_epaper_download_aaz
check_es_system
check_fast_lta
check_haproxy
check_http_phantom
check_ibm_hardware
check_infoblox
check_mssql
check_mysql
check_mysql_slavestatus
check_netapp_ng
check_netapp_ontap
check_nginx_status
check_nrpe2
check_nrpe2_check_apt
check_nrpe2_check_cpu_stats
check_nrpe2_check_disk
check_nrpe2_check_diskio_device
check_nrpe2_check_load
check_nrpe2_check_lxc
check_nrpe2_check_mailq
check_nrpe2_check_mem
check_nrpe2_check_netio
check_nrpe2_check_procs_total
check_nrpe2_check_smart
check_nrpe2_check_swap
check_nrpe2_check_users
check_nrpe2_check_zypper
check_nwc_health
check_postgres
check_rabbitmq_connections
check_rancher2
check_rbl
check_redis
check_sap
check_ssl_cert
check_ucs_path
check_vmware_snapshots
check_vsx
check_win_disk_usage
check_win_net_usage
cluster
dns
enste
esxi_hardware
ftp
hostalive
http
icinga
inx_status
ldap
logstash
mongodb
netscaler
nrpe
nrpe_check_apt
nrpe_check_cpu_stats
nrpe_check_disk
nrpe_check_diskio_device
nrpe_check_drivesize
nrpe_check_files
nrpe_check_load
nrpe_check_lxc
nrpe_check_mailq
nrpe_check_mem
nrpe_check_netio
nrpe_check_postgres
nrpe_check_procs_total
nrpe_check_smart
nrpe_check_swap
nrpe_check_tcp
nrpe_check_users
nrpe_check_varnish
nrpe_check_varnish_name
nrpe_check_zypper
nscp
nscp_CPULOAD
nscp_MEMUSE
nscp_USEDDISKSPACE
ping
rvice=Network IO docker0
smtp
ssh
tcp
vmware-esx-dc-volumes
vmware-esx-soap-host-cpu-usage
vmware-esx-soap-host-io
vmware-esx-soap-host-io-read-latency
vmware-esx-soap-host-io-write-latency
vmware-esx-soap-host-mem-usage
vmware-esx-soap-host-net-usage
root@inf-monix01-p:~# 

All (corrupt) measurements with a weird character are now gone! The same was also checked on the series (show series) and they all are OK, too!


Add a comment

Show form to leave a comment

Comments (newest first)

Alexander Reichle-Schmehl from wrote on Jul 15th, 2022:

Thanks for your hint!

Based upon your ideas, we basically did:
echo "drop series from /^[^a-z]/" |influx -database icinga2
echo "drop series from /^[a-z][^a-z]/" |influx -database icinga2
echo "drop series from /^[a-z][a-z][^a-z_]/" |influx -database icinga2

and that seems to have cleaned up our influx database!


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder