A recent problem in our ELK stack this week had quite an (unexpected) impact. We experienced problems in our Docker environments (see Docker logging with GELF using tcp: The good, the bad, the ugly) and also forced myself to update the monitoring plugin check_es_system.
But once I got the Elasticsearch cluster running again, one problem remained: 5 shards were unable to be allocated/assigned:
admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass | egrep "(UNASSIGNED|INIT)"
docker-2019.04.03 4 r UNASSIGNED
docker-2019.04.03 0 r UNASSIGNED
datateam-2019.03.25 1 r UNASSIGNED
haproxy-2019.03.23 2 r UNASSIGNED
.security-6 0 r UNASSIGNED
I already came across such a problem with shards not able to be assigned and was able to solve it by manually assigning these shards using the commands "allocate_replica" and "allocate_stale_primary" (see ElasticSearch cluster stays red, stuck unassigned shards not being assigned). But this time this didn't work:
admin@es01:~$ curl -u elastic:pass -X POST -H "Content-Type: application/json" "http://localhost:9200/_cluster/reroute?retry_failed=true" -d "{ \"commands\" : [ { \"allocate_replica\": { \"index\": \".security-6\", \"shard\": 0, \"node\": \"es01\" } } ] }"
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [.security-6][0] on node {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-04-03T08:23:28.094Z], failed_attempts[5], delayed=false, details[failed shard on node [0o0eQXxcSJuWIFG2ohjwUg]: failed recovery, failure RecoveryFailedException[[.security-6][0]: Recovery failed from {es02}{t3GAvhY1SS2xZkt4U389jw}{jBEFGCmBQeOiYpLUF9zrgw}{192.168.66.12}{192.168.66.12:9300}{ml.machine_memory=31624278016, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[es02][192.168.66.12:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: ReceiveTimeoutTransportException[[es01][192.168.66.11:9300][internal:index/shard/recovery/prepare_translog] request_id [3025546] timed out after [900098ms]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.5.4] since this is equal-or-newer than the primary version [6.5.4])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [1.8tb], shard size: [0b], free after allocating shard: [1.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)]"}],"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [.security-6][0] on node {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-04-03T08:23:28.094Z], failed_attempts[5], delayed=false, details[failed shard on node [0o0eQXxcSJuWIFG2ohjwUg]: failed recovery, failure RecoveryFailedException[[.security-6][0]: Recovery failed from {es02}{t3GAvhY1SS2xZkt4U389jw}{jBEFGCmBQeOiYpLUF9zrgw}{192.168.66.12}{192.168.66.12:9300}{ml.machine_memory=31624278016, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[es02][192.168.66.12:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: ReceiveTimeoutTransportException[[es01][192.168.66.11:9300][internal:index/shard/recovery/prepare_translog] request_id [3025546] timed out after [900098ms]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.5.4] since this is equal-or-newer than the primary version [6.5.4])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [1.8tb], shard size: [0b], free after allocating shard: [1.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)]"},"status":400}
Yes, it's quite a big error message. The important part is the following though:
reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry
As you can see in the command before, I already tried it with the "retry_failed" parameter, with the same error as a result.
That's one of these moment when you need to research and read the documentation. The relevant information can be found on the documentation's page about Cluster Reroute:
The cluster will attempt to allocate a shard a maximum of index.allocation.max_retries times in a row (defaults to 5), before giving up and leaving the shard unallocated.
Let's check the current settings of that index (.security-6):
admin@es01:~$ curl -u elastic:pass "http://localhost:9200/.security-6/_settings?pretty"
{
".security-6" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"auto_expand_replicas" : "0-all",
"provided_name" : ".security-6",
"format" : "6",
"creation_date" : "1533818873742",
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : "true",
"patterns" : [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)"
]
}
},
"analyzer" : {
"email" : {
"filter" : [
"email",
"lowercase",
"unique"
],
"tokenizer" : "uax_url_email"
}
}
},
"priority" : "1000",
"number_of_replicas" : "1",
"uuid" : "gCuCQStxS-e9dKbuJqH-QQ",
"version" : {
"created" : "6030199",
"upgraded" : "6050499"
}
}
}
}
}
The index's settings does not contain a "index.allocation.max_retries" value. This means the default (5) is used.
We can update the index's settings and set "index.allocation.max_retries" to a higher level:
admin@es01:~$ curl -u elastic:pass -H "Content-Type: application/json" -X PUT "http://localhost:9200/.security-6/_settings" -d "{ \"index.allocation.max_retries\": 6}"
{"acknowledged":true}
Verify that the new setting appears:
admin@es01:~$ curl -u elastic:pass "http://localhost:9200/.security-6/_settings?pretty"
{
".security-6" : {
"settings" : {
"index" : {
"allocation" : {
"max_retries" : "6"
},
"number_of_shards" : "1",
"auto_expand_replicas" : "0-all",
"provided_name" : ".security-6",
"format" : "6",
"creation_date" : "1533818873742",
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : "true",
"patterns" : [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)"
]
}
},
"analyzer" : {
"email" : {
"filter" : [
"email",
"lowercase",
"unique"
],
"tokenizer" : "uax_url_email"
}
}
},
"priority" : "1000",
"number_of_replicas" : "1",
"uuid" : "gCuCQStxS-e9dKbuJqH-QQ",
"version" : {
"created" : "6030199",
"upgraded" : "6050499"
}
}
}
}
}
Yep, it's there!
And now try to allocate the replica shard again:
admin@es01:~$ curl -u elastic:pass -X POST -H "Content-Type: application/json" "http://localhost:9200/_cluster/reroute?retry_failed=true" -d "{ \"commands\" : [ { \"allocate_replica\": { \"index\": \".security-6\", \"shard\": 0, \"node\": \"es01\" } } ] }"
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"[allocate_replica] all copies of [.security-6][0] are already assigned. Use the move allocation command instead"}],"type":"illegal_argument_exception","reason":"[allocate_replica] all copies of [.security-6][0] are already assigned. Use the move allocation command instead"},"status":400}
Oh, it's already assigned? What happened?
admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass | egrep ".security-6"
.security-6 0 p STARTED 15 55.3kb 192.168.66.12 es02
.security-6 0 r STARTED 15 39.6kb 192.168.66.11 es01
Seems that the index was automatically allocated, once the "index.allocation.max_retries" value was set to 6.
After I updated the index settings of the remaining unassigned replica shards, they were all automatically allocated:
admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass | egrep "(UNASSIGNED|INIT)"
docker-2019.04.03 4 r INITIALIZING 192.168.66.11 es01
docker-2019.04.03 0 r INITIALIZING 192.168.66.11 es01
The monitoring plugin check_es_system showed the initializing shards during that time (no more unassigned shards discovered):
admin@es01:~$ /usr/lib/nagios/plugins/check_es_system.sh -H "192.168.66.12" -t status -u elastic -p pass
ES SYSTEM WARNING - Elasticsearch Cluster is yellow (2 nodes, 2 data nodes, 1752 shards, 0 relocating shards, 2 initializing shards, 0 unassigned shards, 6771000328 docs)|total_nodes=2;;;; data_nodes=2;;;; total_shards=1752;;;; relocating_shards=0;;;; initializing_shards=2;;;; unassigned_shards=0;;;; docs=6771000328;;;;
It took a few moments until the cluster was finally green again:
admin@es01:~$ /usr/lib/nagios/plugins/check_es_system.sh -H "192.168.66.12" -t status -u elastic -p pass
ES SYSTEM OK - Elasticsearch Cluster is green (2 nodes, 2 data nodes, 1754 shards, 6771208539 docs)|total_nodes=2;;;; data_nodes=2;;;; total_shards=1754;;;; relocating_shards=0;;;; initializing_shards=0;;;; unassigned_shards=0;;;; docs=6771208539;;;;
Mike from wrote on Dec 18th, 2021:
Legend, thanks for this post! I've been fighting with ES for 24 hours and I had got everything back but one shard and this was the last piece of the puzzle. I can now go get some sleep
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder