Elasticsearch: Shards fail to allocate due to maximum number of retries exceeded

Written by - 1 comments

Published on - Listed in ELK Elasticsearch Linux


A recent problem in our ELK stack this week had quite an (unexpected) impact. We experienced problems in our Docker environments (see Docker logging with GELF using tcp: The good, the bad, the ugly) and also forced myself to update the monitoring plugin check_es_system.

But once I got the Elasticsearch cluster running again, one problem remained: 5 shards were unable to be allocated/assigned:

admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass  | egrep "(UNASSIGNED|INIT)"
docker-2019.04.03               4 r UNASSIGNED
docker-2019.04.03               0 r UNASSIGNED
datateam-2019.03.25             1 r UNASSIGNED
haproxy-2019.03.23              2 r UNASSIGNED
.security-6                     0 r UNASSIGNED

I already came across such a problem with shards not able to be assigned and was able to solve it by manually assigning these shards using the commands "allocate_replica" and "allocate_stale_primary" (see ElasticSearch cluster stays red, stuck unassigned shards not being assigned). But this time this didn't work:

admin@es01:~$ curl -u elastic:pass -X POST -H "Content-Type: application/json" "http://localhost:9200/_cluster/reroute?retry_failed=true" -d "{ \"commands\" : [ { \"allocate_replica\": { \"index\": \".security-6\", \"shard\": 0, \"node\": \"es01\" } } ]  }"
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [.security-6][0] on node {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-04-03T08:23:28.094Z], failed_attempts[5], delayed=false, details[failed shard on node [0o0eQXxcSJuWIFG2ohjwUg]: failed recovery, failure RecoveryFailedException[[.security-6][0]: Recovery failed from {es02}{t3GAvhY1SS2xZkt4U389jw}{jBEFGCmBQeOiYpLUF9zrgw}{192.168.66.12}{192.168.66.12:9300}{ml.machine_memory=31624278016, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[es02][192.168.66.12:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: ReceiveTimeoutTransportException[[es01][192.168.66.11:9300][internal:index/shard/recovery/prepare_translog] request_id [3025546] timed out after [900098ms]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.5.4] since this is equal-or-newer than the primary version [6.5.4])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [1.8tb], shard size: [0b], free after allocating shard: [1.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)]"}],"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [.security-6][0] on node {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-04-03T08:23:28.094Z], failed_attempts[5], delayed=false, details[failed shard on node [0o0eQXxcSJuWIFG2ohjwUg]: failed recovery, failure RecoveryFailedException[[.security-6][0]: Recovery failed from {es02}{t3GAvhY1SS2xZkt4U389jw}{jBEFGCmBQeOiYpLUF9zrgw}{192.168.66.12}{192.168.66.12:9300}{ml.machine_memory=31624278016, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[es02][192.168.66.12:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: ReceiveTimeoutTransportException[[es01][192.168.66.11:9300][internal:index/shard/recovery/prepare_translog] request_id [3025546] timed out after [900098ms]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.5.4] since this is equal-or-newer than the primary version [6.5.4])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [1.8tb], shard size: [0b], free after allocating shard: [1.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)]"},"status":400}

Yes, it's quite a big error message. The important part is the following though:

reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry

As you can see in the command before, I already tried it with the "retry_failed" parameter, with the same error as a result.

That's one of these moment when you need to research and read the documentation. The relevant information can be found on the documentation's page about Cluster Reroute:

The cluster will attempt to allocate a shard a maximum of index.allocation.max_retries times in a row (defaults to 5), before giving up and leaving the shard unallocated.

 Let's check the current settings of that index (.security-6):

admin@es01:~$ curl -u elastic:pass "http://localhost:9200/.security-6/_settings?pretty"
{
  ".security-6" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "auto_expand_replicas" : "0-all",
        "provided_name" : ".security-6",
        "format" : "6",
        "creation_date" : "1533818873742",
        "analysis" : {
          "filter" : {
            "email" : {
              "type" : "pattern_capture",
              "preserve_original" : "true",
              "patterns" : [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)"
              ]
            }
          },
          "analyzer" : {
            "email" : {
              "filter" : [
                "email",
                "lowercase",
                "unique"
              ],
              "tokenizer" : "uax_url_email"
            }
          }
        },
        "priority" : "1000",
        "number_of_replicas" : "1",
        "uuid" : "gCuCQStxS-e9dKbuJqH-QQ",
        "version" : {
          "created" : "6030199",
          "upgraded" : "6050499"
        }
      }
    }
  }
}

The index's settings does not contain a "index.allocation.max_retries" value. This means the default (5) is used.

We can update the index's settings and set "index.allocation.max_retries" to a higher level:

admin@es01:~$ curl -u elastic:pass -H "Content-Type: application/json" -X PUT "http://localhost:9200/.security-6/_settings" -d "{ \"index.allocation.max_retries\": 6}"
{"acknowledged":true}

Verify that the new setting appears:

admin@es01:~$ curl -u elastic:pass "http://localhost:9200/.security-6/_settings?pretty"
{
  ".security-6" : {
    "settings" : {
      "index" : {
        "allocation" : {
          "max_retries" : "6"
        },

        "number_of_shards" : "1",
        "auto_expand_replicas" : "0-all",
        "provided_name" : ".security-6",
        "format" : "6",
        "creation_date" : "1533818873742",
        "analysis" : {
          "filter" : {
            "email" : {
              "type" : "pattern_capture",
              "preserve_original" : "true",
              "patterns" : [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)"
              ]
            }
          },
          "analyzer" : {
            "email" : {
              "filter" : [
                "email",
                "lowercase",
                "unique"
              ],
              "tokenizer" : "uax_url_email"
            }
          }
        },
        "priority" : "1000",
        "number_of_replicas" : "1",
        "uuid" : "gCuCQStxS-e9dKbuJqH-QQ",
        "version" : {
          "created" : "6030199",
          "upgraded" : "6050499"
        }
      }
    }
  }
}

Yep, it's there!

And now try to allocate the replica shard again:

admin@es01:~$ curl -u elastic:pass -X POST -H "Content-Type: application/json" "http://localhost:9200/_cluster/reroute?retry_failed=true" -d "{ \"commands\" : [ { \"allocate_replica\": { \"index\": \".security-6\", \"shard\": 0, \"node\": \"es01\" } } ]  }"
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"[allocate_replica] all copies of [.security-6][0] are already assigned. Use the move allocation command instead"}],"type":"illegal_argument_exception","reason":"[allocate_replica] all copies of [.security-6][0] are already assigned. Use the move allocation command instead"},"status":400}

Oh, it's already assigned? What happened?

admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass | egrep ".security-6"
.security-6    0 p STARTED     15  55.3kb 192.168.66.12 es02
.security-6    0 r STARTED     15  39.6kb 192.168.66.11 es01

Seems that the index was automatically allocated, once the "index.allocation.max_retries" value was set to 6.

After I updated the index settings of the remaining unassigned replica shards, they were all automatically allocated:

admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass | egrep "(UNASSIGNED|INIT)"
docker-2019.04.03   4 r INITIALIZING   192.168.66.11 es01
docker-2019.04.03   0 r INITIALIZING   192.168.66.11 es01

The monitoring plugin check_es_system showed the initializing shards during that time (no more unassigned shards discovered):

admin@es01:~$ /usr/lib/nagios/plugins/check_es_system.sh -H "192.168.66.12" -t status -u elastic -p pass
ES SYSTEM WARNING - Elasticsearch Cluster is yellow (2 nodes, 2 data nodes, 1752 shards, 0 relocating shards, 2 initializing shards, 0 unassigned shards, 6771000328 docs)|total_nodes=2;;;; data_nodes=2;;;; total_shards=1752;;;; relocating_shards=0;;;; initializing_shards=2;;;; unassigned_shards=0;;;; docs=6771000328;;;;

It took a few moments until the cluster was finally green again:

admin@es01:~$ /usr/lib/nagios/plugins/check_es_system.sh -H "192.168.66.12" -t status -u elastic -p pass
ES SYSTEM OK - Elasticsearch Cluster is green (2 nodes, 2 data nodes, 1754 shards, 6771208539 docs)|total_nodes=2;;;; data_nodes=2;;;; total_shards=1754;;;; relocating_shards=0;;;; initializing_shards=0;;;; unassigned_shards=0;;;; docs=6771208539;;;;


Add a comment

Show form to leave a comment

Comments (newest first)

Mike from wrote on Dec 18th, 2021:

Legend, thanks for this post! I've been fighting with ES for 24 hours and I had got everything back but one shard and this was the last piece of the puzzle. I can now go get some sleep


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder