Handling validity check failed and empty client certificate chain errors in Elasticsearch

Written by - 0 comments

Published on - last updated on March 7th 2022 - Listed in Elasticsearch ELK Monitoring TLS SSL Security


The communication in an Elasticsearch cluster usually happens over the transport port (tcp/9300). With xpack enabled, TLS certificates can be installed and used to encrypt the communication between the nodes.

Here's an example of such a SSL/TLS setup with certificates in PEM format:

root@esnode1:~# cat /etc/elasticsearch/elasticsearch.yml
[...]
#xpack settings
xpack.security.enabled:  true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: certs/private.key
xpack.security.transport.ssl.certificate: certs/certificate.crt
xpack.security.transport.ssl.certificate_authorities: [ "certs/chain.crt" ]

Handling an expired certificate

When a certificate expires, the node(s) are unable to communicate with the node on which the certificate has expired. The logs will contain a lot of entries, mentioning validity check failed. Somewhere in the middle of the logs you might also spot the reason for it: An expired validity date mentioned by java.security.cert.CertificateExpiredException.

[2022-03-01T07:49:23,116][WARN ][o.e.d.PeerFinder         ] [esnode3] address [192.168.22.51:9300], node [null], requesting [false] connection failed: [][192.168.22.51:9300] general node connection failure: handshake failed because connection reset
[2022-03-01T07:49:23,116][WARN ][o.e.t.TcpTransport       ] [esnode3] exception caught on transport layer [Netty4TcpChannel{localAddress=/192.168.22.53:44304, remoteAddress=192.168.22.51/192.168.22.51:9300, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
[...]
Caused by: java.security.cert.CertPathValidatorException: validity check failed
[...]
Caused by: java.security.cert.CertificateExpiredException: NotAfter: Mon Jan 17 00:59:59 CET 2022
[...]

The validity of the installed certificate can easily be monitored, for example with the monitoring plugin check_ssl_cert, targeting the transport port (9300):

ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode1 -p 9300
SSL_CERT CRITICAL *.example.com: x509 certificate element 1 is expired (was valid until Jan 16 23:59:59 2022 GMT)|days_chain_elem1=-43;20;15;;

Obviously the certificates (and maybe key) in the "certs" directory need to be replaced.

SSLHandshakeException: Empty client certificate chain

A more confusing error message in the logs is the empty client certificate chain error:

[2022-03-01T09:42:57,519][WARN ][o.e.t.TcpTransport       ] [esnode3] exception caught on transport layer [Netty4TcpChannel{localAddress=/192.168.22.53:9300, remoteAddress=/192.168.15.31:47298, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Empty client certificate chain
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:477) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
[...]
Caused by: javax.net.ssl.SSLHandshakeException: Empty client certificate chain

Why is this error message confusing? Because one would assume that the problem is happening on the client side - yet it's the server certificate chain which is not correctly installed.

This can again be verified by using check_ssl_cert:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode1 -p 9300
SSL_CERT CRITICAL *.example.com: Cannot verify certificate: unable to get local issuer certificate, unable to verify the first certificate|days_chain_elem1=321;20;15;; 

The check only received one certificate (the server certificate) back, without the chain. 

This is certainly not expected, as the chain is also configured, using the xpack.security.transport.ssl.certificate_authorities setting:

root@esnode2:~# cat /etc/elasticsearch/elasticsearch.yml | grep certificate
xpack.security.transport.ssl.key: certs/private.key
xpack.security.transport.ssl.certificate: certs/certificate.crt
xpack.security.transport.ssl.certificate_authorities: [ "certs/chain.crt" ]

More research leads to Elasticsearch issue #31725 and the following comment:

Indeed the xpack.security.http.ssl.certificate setting should contain a chain.

The renewed certs/certificate.crt file only contained the server certificate in this case. By appending the chain into certificate.crt we can create a full chain:

root@esnode2:/etc/elasticsearch/certs# cat chain.crt >> certificate.crt

Right after this, Elasticsearch should automatically discover a change in the SSL file (Elasticsearch restart not required) and the following message should show up in the logs:

[2022-03-01T12:10:25,276][INFO ][o.e.x.c.s.SSLConfigurationReloader] [inf-elkesi01-p] reloaded [/etc/elasticsearch/certs/certificate.crt] and updated ssl contexts using this file

This fixes the empty client certificate chain errors and monitoring is happy, too:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode2 -p 9300
SSL_CERT OK - x509 certificate '*.example.com' from 'Gandi Standard SSL CA 2' valid until Jan 16 23:59:59 2023 GMT (expires in 321 days)|days_chain_elem1=321;20;15;; days_chain_elem2=925;20;15;; days_chain_elem3=5802;20;15;;

For users knowing SSL configurations with a dedicated chain/CA file (e.g. Apache web server), such as me, the certificate_authorities setting is pretty confusing.

TL;DR: xpack.security.transport.ssl.certificate must be full-chain certificate for a correct installation.

Requests from non cluster nodes: empty client certificate chain

Even though the full certificate chain is fixed, the log events (Caused by: javax.net.ssl.SSLHandshakeException: Empty client certificate chain) can still show up. This is the case when non-cluster nodes access the transport port 9300, for example a monitoring server using the check_ssl_cert plugin. As the request to port 9300 does not include a client certificate, the error is actually correct but can be ignored.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder