While upgrading Elasticsearch on a test ELK stack from 6.8.6 to 7.15, Elasticsearch failed to start after the upgrade. Troubleshooting showed that this happened because of a single index being outdated.
Elasticsearch, with the installed version of 6.8.6, was updated to the currently newest version 7.15 using the APT repositories from Elastic.
Preparing to unpack .../elasticsearch_7.15.0_amd64.deb ...
Unpacking elasticsearch (7.15.0) over (6.8.6) ...
But once Elasticsearch 7.15 was installed and the configuration files elasticsearch.yml and jvm.options were adjusted for the new version, ES failed to start and threw the following error:
[2021-09-30T14:40:27,111][ERROR][o.e.b.Bootstrap ] [elk01] Exception
java.lang.IllegalStateException: The index [idx/HBCW_-NwTQObOdb0twtN8g] was created with version [5.5.1] but the minimum compatible version is [6.0.0-beta1]. It should be re-indexed in Elasticsearch 6.x before upgrading to 7.15.0.
at org.elasticsearch.cluster.metadata.IndexMetadataVerifier.checkSupportedVersion(IndexMetadataVerifier.java:90) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.cluster.metadata.IndexMetadataVerifier.verifyIndexMetadata(IndexMetadataVerifier.java:75) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.gateway.GatewayMetaState.upgradeMetadata(GatewayMetaState.java:236) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.gateway.GatewayMetaState.upgradeMetadataForNode(GatewayMetaState.java:220) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.gateway.GatewayMetaState.start(GatewayMetaState.java:150) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.node.Node.start(Node.java:916) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:313) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:408) [elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:167) [elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:158) [elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:75) [elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:114) [elasticsearch-cli-7.15.0.jar:7.15.0]
at org.elasticsearch.cli.Command.main(Command.java:79) [elasticsearch-cli-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:123) [elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:81) [elasticsearch-7.15.0.jar:7.15.0]
According to the reindex documentation, this would copy the failed index (HBCW_-NwTQObOdb0twtN8g) to a new index name and the old index could be deleted. But that doesn't really apply in this case anymore because Elasticsearch needs to be running to access the API.
In general, reindexing should happen before upgrading to a new major version. But as this is happening on a test ELK stack this is what this test environment is for: Figure out what will break and how to fix it if something breaks.
So the idea was to downgrade Elasticsearch to 6.8.x again and then fix/re-index the mentioned index.
root@elk01:~# apt-get install elasticsearch=6.8.6
After the configuration files were adjusted once again (this time for the 6.8.x version), Elasticsearch successfully started.
Note: Elasticsearch's Systemd service file (/usr/lib/systemd/system/elasticsearch.service) also needs to be adjusted in case the package downgrade does not install it. The Systemd unit file from version 7.15 does not work with Elasticsearch 6.8.6!
Here's a full example of a working 6.8.6 unit file:
ck@elk01:~$ cat /usr/lib/systemd/system/elasticsearch.service
[Unit]
Description=Elasticsearch
Documentation=http://www.elastic.co
Wants=network-online.target
After=network-online.target
[Service]
RuntimeDirectory=elasticsearch
PrivateTmp=true
Environment=ES_HOME=/usr/share/elasticsearch
Environment=ES_PATH_CONF=/etc/elasticsearch
Environment=PID_DIR=/var/run/elasticsearch
EnvironmentFile=-/etc/default/elasticsearch
WorkingDirectory=/usr/share/elasticsearch
User=elasticsearch
Group=elasticsearch
ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet
# StandardOutput is configured to redirect to journalctl since
# some error messages may be logged in standard output before
# elasticsearch logging system is initialized. Elasticsearch
# stores its logs in /var/log/elasticsearch and does not use
# journalctl by default. If you also want to enable journalctl
# logging, you can simply remove the "quiet" option from ExecStart.
StandardOutput=journal
StandardError=inherit
# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=65535
# Specifies the maximum number of processes
LimitNPROC=4096
# Specifies the maximum size of virtual memory
LimitAS=infinity
# Specifies the maximum file size
LimitFSIZE=infinity
# Disable timeout logic and wait until process is stopped
TimeoutStopSec=0
# SIGTERM signal is used to stop the Java process
KillSignal=SIGTERM
# Send the signal only to the JVM rather than its control group
KillMode=process
# Java process is never killed
SendSIGKILL=no
# When a JVM receives a SIGTERM signal it exits with code 143
SuccessExitStatus=143
[Install]
WantedBy=multi-user.target
# Built for packages-6.8.4 (packages)
And Elasticsearch runs again:
root@elk01:~# netstat -lntup|grep 9200
tcp6 0 0 :::9200 :::* LISTEN 24793/java
Unfortunately the mentioned index, which caused the failure in Elasticsearch 7.15, is not known by its name yet - only the index ID is showing up in the logs. Let's find out the real name of this index:
root@elk01:~# curl -s http://localhost:9200/_cat/indices?pretty | grep HBCW
green open idx HBCW_-NwTQObOdb0twtN8g 5 0 0 0 1.2kb 1.2kb
Alright, the name of the affected index is actually "idx". A weird name, as most other indices in this ELK stack are usually ending with the current date. The size of this index is also kind of weird: Only 1.2KB!
Let's find out, what data is held in this index. First the settings:
root@elk01:~# curl -H 'Content-Type: application/json' -X GET http://localhost:9200/idx?pretty
{
"idx" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index" : {
"number_of_shards" : "5",
"blocks" : {
"read_only_allow_delete" : "true"
},
"provided_name" : "idx",
"creation_date" : "1502371662615",
"number_of_replicas" : "0",
"uuid" : "HBCW_-NwTQObOdb0twtN8g",
"version" : {
"created" : "5050199",
"upgraded" : "6080699"
}
}
}
}
}
and the data:
root@elk01:~# curl -H 'Content-Type: application/json' -X GET http://localhost:9200/idx/_search?pretty
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Which basically means: This index does not hold any data. Let's delete this one as I could not find any Elasticsearch dependency on a index name called "idx":
root@elk01:~# curl -X DELETE 'http://localhost:9200/idx'
{"acknowledged":true}
The error in Elasticsearch 7.15 only mentioned one index ID where this version compatibility problem occurred. But are there others? Are these all the affected indices or is only the first failing index logged?
The Elasticsearch upgrade documentation also mentions to check the deprecation log file for such indices, but this turned out to be a waste of time because not only the index causing upgrade problems was mentioned in it!
Again, as this is a TEST cluster, let's find out.
Time to upgrade ES to 7.15 again:
root@elk01:~# apt-get dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
elasticsearch
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/349 MB of archives.
After this operation, 329 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Preparing to unpack .../elasticsearch_7.15.0_amd64.deb ...
Unpacking elasticsearch (7.15.0) over (6.8.6) ...
dpkg: warning: unable to delete old directory '/var/run/elasticsearch': Directory not empty
Processing triggers for systemd (229-4ubuntu21.31) ...
Processing triggers for ureadahead (0.100.0-19.1) ...
Setting up elasticsearch (7.15.0) ...
Once the configuration files elasticsearch.yml and jvm.options (for Java heap sizes) were adjusted for 7.15 - once again - Elasticsearch was started.
root@elk01:~# systemctl restart elasticsearch
And this time, Elasticsearch was finally up.
root@elk01:~# netstat -lntup|grep 9200
tcp6 0 0 :::9200 :::* LISTEN 4640/java
This upgrade problem we faced in the test ELK stack showed up that we should be cautious before upgrading Elasticsearch on our PROD ELK stack. The main problem is, that such an "outdated" index is not really showing up anywhere. Neither the API itself shows it as a deprecated index nor the deprecation logs show any real clue.
One possibility how to tackle this is by having a multi node Elasticsearch cluster (we do in our PROD ELK stack, yay!). After the first node is upgraded to a new major version, it would most likely fail to start if any such compatibility issues on indices arise. These indices can then be re-indexed (or deleted if there is no relevant data) on one of the other nodes, still running the older Elasticsearch version. With proper load balancing this shouldn't even cause a downtime.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder