The communication in an Elasticsearch cluster usually happens over the transport port (tcp/9300). With xpack enabled, TLS certificates can be installed and used to encrypt the communication between the nodes.
Here's an example of such a SSL/TLS setup with certificates in PEM format:
root@esnode1:~# cat /etc/elasticsearch/elasticsearch.yml
[...]
#xpack settings
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: certs/private.key
xpack.security.transport.ssl.certificate: certs/certificate.crt
xpack.security.transport.ssl.certificate_authorities: [ "certs/chain.crt" ]
When a certificate expires, the node(s) are unable to communicate with the node on which the certificate has expired. The logs will contain a lot of entries, mentioning validity check failed. Somewhere in the middle of the logs you might also spot the reason for it: An expired validity date mentioned by java.security.cert.CertificateExpiredException.
[2022-03-01T07:49:23,116][WARN ][o.e.d.PeerFinder ] [esnode3] address [192.168.22.51:9300], node [null], requesting [false] connection failed: [][192.168.22.51:9300] general node connection failure: handshake failed because connection reset
[2022-03-01T07:49:23,116][WARN ][o.e.t.TcpTransport ] [esnode3] exception caught on transport layer [Netty4TcpChannel{localAddress=/192.168.22.53:44304, remoteAddress=192.168.22.51/192.168.22.51:9300, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
[...]
Caused by: java.security.cert.CertPathValidatorException: validity check failed
[...]
Caused by: java.security.cert.CertificateExpiredException: NotAfter: Mon Jan 17 00:59:59 CET 2022
[...]
The validity of the installed certificate can easily be monitored, for example with the monitoring plugin check_ssl_cert, targeting the transport port (9300):
ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode1 -p 9300
SSL_CERT CRITICAL *.example.com: x509 certificate element 1 is expired (was valid until Jan 16 23:59:59 2022 GMT)|days_chain_elem1=-43;20;15;;
Obviously the certificates (and maybe key) in the "certs" directory need to be replaced.
A more confusing error message in the logs is the empty client certificate chain error:
[2022-03-01T09:42:57,519][WARN ][o.e.t.TcpTransport ] [esnode3] exception caught on transport layer [Netty4TcpChannel{localAddress=/192.168.22.53:9300, remoteAddress=/192.168.15.31:47298, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Empty client certificate chain
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:477) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
[...]
Caused by: javax.net.ssl.SSLHandshakeException: Empty client certificate chain
Why is this error message confusing? Because one would assume that the problem is happening on the client side - yet it's the server certificate chain which is not correctly installed.
This can again be verified by using check_ssl_cert:
ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode1 -p 9300
SSL_CERT CRITICAL *.example.com: Cannot verify certificate: unable to get local issuer certificate, unable to verify the first certificate|days_chain_elem1=321;20;15;;
The check only received one certificate (the server certificate) back, without the chain.
This is certainly not expected, as the chain is also configured, using the xpack.security.transport.ssl.certificate_authorities setting:
root@esnode2:~# cat /etc/elasticsearch/elasticsearch.yml | grep certificate
xpack.security.transport.ssl.key: certs/private.key
xpack.security.transport.ssl.certificate: certs/certificate.crt
xpack.security.transport.ssl.certificate_authorities: [ "certs/chain.crt" ]
More research leads to Elasticsearch issue #31725 and the following comment:
Indeed the xpack.security.http.ssl.certificate setting should contain a chain.
The renewed certs/certificate.crt file only contained the server certificate in this case. By appending the chain into certificate.crt we can create a full chain:
root@esnode2:/etc/elasticsearch/certs# cat chain.crt >> certificate.crt
Right after this, Elasticsearch should automatically discover a change in the SSL file (Elasticsearch restart not required) and the following message should show up in the logs:
[2022-03-01T12:10:25,276][INFO ][o.e.x.c.s.SSLConfigurationReloader] [inf-elkesi01-p] reloaded [/etc/elasticsearch/certs/certificate.crt] and updated ssl contexts using this file
This fixes the empty client certificate chain errors and monitoring is happy, too:
ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode2 -p 9300
SSL_CERT OK - x509 certificate '*.example.com' from 'Gandi Standard SSL CA 2' valid until Jan 16 23:59:59 2023 GMT (expires in 321 days)|days_chain_elem1=321;20;15;; days_chain_elem2=925;20;15;; days_chain_elem3=5802;20;15;;
For users knowing SSL configurations with a dedicated chain/CA file (e.g. Apache web server), such as me, the certificate_authorities setting is pretty confusing.
TL;DR: xpack.security.transport.ssl.certificate must be full-chain certificate for a correct installation.
Even though the full certificate chain is fixed, the log events (Caused by: javax.net.ssl.SSLHandshakeException: Empty client certificate chain) can still show up. This is the case when non-cluster nodes access the transport port 9300, for example a monitoring server using the check_ssl_cert plugin. As the request to port 9300 does not include a client certificate, the error is actually correct but can be ignored.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder