monit unexpectedly restarted web application server due to http status

Written by - 0 comments

Published on - Listed in Linux Monitoring


Had to investigate a problem where monit (a small but powerful daemon to check processes) had restarted the process of an application server.

The application server itself serves a web application and listens on port 8088. The monit check looks like this:

check process application matching "/srv/application/deploy/bin/application"
    start program = "/etc/init.d/application start"
    stop program = "/etc/init.d/application stop"
    if failed host 127.0.0.1 port 8088 protocol http then restart
    if 5 restarts within 5 cycles then timeout

In the monit logs (/var/log/monit.log) the restart can clearly be seen:

[CEST Oct  5 20:26:49] error    : 'application' failed protocol test [HTTP] at INET[127.0.0.1:8088] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable
[CEST Oct  5 20:26:49] info     : 'application' trying to restart
[CEST Oct  5 20:26:49] info     : 'application' stop: /etc/init.d/application

I also checked our Icinga monitoring if it did see the same problem. But no, Icinga's HTTP check worked. Then I double-checked with previous monit restarting actions. In the past, when the http request on port 8088 was not reachable, monit logged the following line:

error    : 'application' failed, cannot open a connection to INET[127.0.0.1:8088] via TCP

But in this case the logged event clearly says:

'application' failed protocol test [HTTP] at INET[127.0.0.1:8088] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable

The difference is therefore clearly on layer 7. The port was up - but inside the http protocol something was wrong. The message "resource temporarily unavailable" sounded somewhat familiar to me. I manage a lot of reverse proxies and whenever the upstream/backend server is gone, a 50x error is shown with a similar message. I asked the developer of this application if the application serves a 5xx. Turns out - almost; A backend resource of the application was not available in that very moment and the application, by its design, started to respond with http status 423 (Locked). I checked the monit documentation and indeed found a very important information:

STATUS option can be used to explicitly test the HTTP status code returned by the HTTP server. If not used, the HTTP protocol test will fail if the status code returned is greater than or equal to 400. You can override this behaviour by using the status qualifier.

Here we go. monit expects by default a http status of =< 400. As soon as the application returned a 423, monit considered the application failed and restarted it.

To solve this problem, a specially crafted application URL will from now on be requested (by using the "request" option in the monit check) which always returns a http status 200 as long as the application itself is running.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder