Today I tried to use supervisor to control and supervise the state of a process - a process which should automatically be restarted in case of a failure.
I installed supervisor, created the config file for the [process:myapp] and defined the relevant settings. But when I started the process with supervisorctl, I got this:
supervisorctl start myapp
myapp: ERROR (abnormal termination)
==> supervisord.log <==
2016-02-26 10:03:38,336 INFO spawned: 'myapp' with pid 6760
2016-02-26 10:03:38,380 INFO exited: myapp (exit status 1; not expected)
2016-02-26 10:03:40,384 INFO spawned: 'myapp' with pid 6769
2016-02-26 10:03:40,407 INFO exited: myapp (exit status 1; not expected)
2016-02-26 10:03:43,412 INFO spawned: 'myapp' with pid 6778
2016-02-26 10:03:43,435 INFO exited: myapp (exit status 1; not expected)
2016-02-26 10:03:44,436 INFO gave up: myapp entered FATAL state, too many start retries too quickly
What the hell? After some research I came across the following text - funnily on the official documentation of supervisor (yeah, rtfm, I know):
Programs meant to be run under supervisor should not daemonize themselves. Instead, they should run in the foreground. They should not detach from the terminal from which they are started.
Ah crap. The program was written to be by definition run as a daemon and spawns itself several subprocesses. After several tests and attempted workarounds I ditched supervisor and went on to monit.
Installation, quick and painless:
apt-get install monit
Then I adapted the default "check interval" from the default 120 seconds to 30 seconds:
sed -i "s/set daemon 120/set daemon 30/" /etc/monit/monitrc
And created a separate config file for the monit http daemon:
cat /etc/monit/conf.d/monit-http
set httpd port 2812 and
allow localhost
Note: You should add authentication, too!
That's already the basic configuration for monit. Let's continue with the monit check for myapp:
cat /etc/monit/conf.d/myapp
check process myapp matching "/srv/myapp/bin/myapp"
start program = "/etc/init.d/myapp start"
stop program = "/etc/init.d/myapp stop"
if failed host localhost port 8088 protocol http then restart
if 5 restarts within 5 cycles then timeout
Because this daemon "myapp" doesn't create a PID file when it's started, I used the "matching" option (instead of the "with pidfile" option seen everywhere in the monit documentation). Matching in this case simply checks the running processes if it matches the given value ("/srv/myapp/bin/myapp").
"start program" defines which command to execute to start the application.
"stop program" defines which command to execute to stop the application.
The first if condition defines the application specific check. In this case the application is listening on port 8088 and serves as little web server. If the check fails to access "localhost port 8080" with "protocol http", then a restart should be initiated.
If 5 restarts within 5 cycles didn't work, then monit should timeout -> myapp will then be unmonitored.
A cycle in this case is the "check interval" defined in /etc/monit/monitrc by the "set daemon" option. As this was previously set to 30 (seconds), this means 5 cycles = 150 seconds.
Activate the new config:
/etc/init.d/monit restart
And now the status can be checked:
monit status
The Monit daemon 5.6 uptime: 0m
Process 'myapp'
status Running
monitoring status Monitored
pid 19751
parent pid 1
uptime 1h 45m
children 0
memory kilobytes 419324
memory kilobytes total 419324
memory percent 6.8%
memory percent total 6.8%
cpu percent 0.0%
cpu percent total 0.0%
port response time 0.001s to 192.168.253.111:8088 [HTTP via TCP]
data collected Fri, 26 Feb 2016 13:55:27
System 'myapp-app01-test'
status Running
monitoring status Monitored
load average [0.19] [0.15] [0.14]
cpu 2.4%us 0.9%sy 0.1%wa
memory usage 2748432 kB [44.9%]
swap usage 171584 kB [4.3%]
data collected Fri, 26 Feb 2016 13:55:27
Now what happens if the process myapp dies? Let's try:
kill -9 `pgrep myapp`; tail -f /var/log/monit.log
A few seconds later the following entries appeared:
[CET Feb 26 13:58:27] error : 'myapp' process is not running
[CET Feb 26 13:58:27] info : 'myapp' trying to restart
[CET Feb 26 13:58:27] info : 'myapp' start: /etc/init.d/myapp
[CET Feb 26 13:58:57] info : 'myapp' process is running with pid 4405
And the process is up again:
pgrep myapp
4405
Success! I love to go into the weekend like this! :-)
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder