lxc-destroy command not working when run as cron job

Written by - 0 comments

Published on - Listed in Linux LXC


Been pulling my hair out (what's left of it) because a script which was working fine when I launched it manually, but the very same script did not work when it was run by cron. I narrowed it down to the "lxc-destroy" command which simply does not do anything when run by cron. All this on an Ubuntu 14.04 LTS with LXC 1.0.7.

I was able to reproduce it with a simple cronjob like this:

03 08 * * * lxc-stop -n mycontainer && lxc-destroy -n mycontainer

What happened right after the cronjob ran? The container was correctly stopped - but remained stopped. The container was still showing up in lxc-ls (in state STOPPED).  I suspected there is a problem deleting the logical volume (lxc.rootfs) of the container when lxc-destroy is launched, because a quick "lvs" after the cronjob ran through still showed the LV of the container.

Now I ran the same commands with strace to see more:

41 08 * * * strace -s 1000 -ff -o /var/log/stop.log lxc-stop -n nzzad-app01-test && strace -s 1000 -ff -o /var/log/destroy.log lxc-destroy -n nzzad-app01-test

This actually created several log files (each child process got its own log file):

-rw-r--r-- 1 root      root   2401 Sep 25 08:41 destroy.log.23100
-rw-r--r-- 1 root      root   2401 Sep 25 08:41 destroy.log.23101
-rw-r--r-- 1 root      root   5180 Sep 25 08:41 destroy.log.23102
-rw-r--r-- 1 root      root    376 Sep 25 08:41 destroy.log.23103
-rw-r--r-- 1 root      root  61340 Sep 25 08:41 destroy.log.23099

The relevant part of the parent PID (23099) showed:

stat("/var/lib/lxc/mycontainer/config", {st_mode=S_IFREG|0644, st_size=808, ...}) = 0
pipe([3, 4])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23100
close(4)                                = 0
read(3, "\377\377\377\377", 4)          = 4
close(3)                                = 0
wait4(23100, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23100
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23100, si_status=1, si_utime=0, si_stime=0} ---
socket(PF_LOCAL, SOCK_STREAM, 0)        = 3
connect(3, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 40) = -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 110) = -1 ECONNREFUSED (Connection refused)
close(3)                                = 0
stat("/var/lib/lxc/mycontainer/config", {st_mode=S_IFREG|0644, st_size=808, ...}) = 0
open("/run/lxc/lock//var/lib/lxc/mycontainer", O_RDWR|O_CREAT, 0600) = 3
fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
pipe([4, 5])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23101
close(5)                                = 0
read(4, "\377\377\377\377", 4)          = 4
close(4)                                = 0
wait4(23101, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23101
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23101, si_status=1, si_utime=0, si_stime=0} ---
socket(PF_LOCAL, SOCK_STREAM, 0)        = 4
connect(4, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 40) = -1 ECONNREFUSED (Connection refused)
connect(4, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 110) = -1 ECONNREFUSED (Connection refused)
close(4)                                = 0
open("/var/lib/lxc/mycontainer/lxc_snapshots", O_RDONLY) = -1 ENOENT (No such file or directory)
geteuid()                               = 0
pipe2([4, 5], O_CLOEXEC)                = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23102
close(5)                                = 0
fcntl(4, F_GETFL)                       = 0 (flags O_RDONLY)
fstat(4, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8da4aff000
lseek(4, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(4, "", 4096)                       = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23102, si_status=127, si_utime=0, si_stime=0} ---
close(4)                                = 0
munmap(0x7f8da4aff000, 4096)            = 0
wait4(23102, [{WIFEXITED(s) && WEXITSTATUS(s) == 127}], 0, NULL) = 23102
stat("/dev/vglxc/mycontainer", {st_mode=S_IFBLK|0660, st_rdev=makedev(252, 0), ...}) = 0
open("/sys/dev/block/252:0/dm/uuid", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8da4aff000
read(4, "LVM-AGQfFleAZmtwXaRlf3qusphA68nyKV56qCiz847f5Jd9gR5Em5pzaJdSfo24s4br\n", 4096) = 69
close(4)                                = 0
munmap(0x7f8da4aff000, 4096)            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23103
wait4(23103, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23103
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23103, si_status=1, si_utime=0, si_stime=0} ---
write(2, "lxc_container: ", 15)         = 15
write(2, "lxccontainer.c: lxcapi_destroy: 2073 ", 37) = 37
write(2, "Error destroying rootfs for mycontainer", 44) = 44
write(2, "\n", 1)                       = 1
fcntl(3, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(3)                                = 0
write(2, "Destroying mycontainer failed\n", 35) = 35
exit_group(1)                           = ?
+++ exited with 1 +++

The more interesting part can be found in the log file of the last child pid (23103) which is responsible for the deletion of the logical volume:

set_robust_list(0x7f8da4af2ba0, 24)     = 0
execve("/usr/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 7 vars */]) = -1 ENOENT (No such file or directory)
execve("/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 7 vars */]) = -1 ENOENT (No such file or directory)
exit_group(1)                           = ?
+++ exited with 1 +++

Now I decided to run the exacte same lxc-destroy command "manually" on the command line, also using strace to log and catch a difference:

strace -s 1000 -ff -o /var/log/manualdestroy.log lxc-destroy -n mycontainer

And indeed! In the log file of the last child process I found this:

set_robust_list(0x7fde2db25ba0, 24)     = 0
execve("/usr/local/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/local/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = 0
brk(0)                                  = 0x1a48000

I don't have to be a C programmer to understand what happens. When the command was launched by cron, the chlid process trying to remove the logical volume was looking for the lvremove command in /usr/bin and in /bin. When it didn't find it there, it exited with an error.
When I ran it manually, the same command was not only looking in /usr/bin and /bin, but also in /usr/local/sbin, /usr/local/bin, /usr/sbin and finally in /sbin where it found the lvremove command and continued.

But why is there a difference in which directories commands should be looked for? The answer is PATH.

The $PATH variable of root when run on the cli:

echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Now to see the $PATH variable of cron, I created a cronjob like this:

58 08 * * * echo $PATH > /var/log/path.log

And now let's take a look at /var/log/path.log:

cat /var/log/path.log
/usr/bin:/bin

Here we have the difference - and the reason why lxc-destroy does not work when run by cron.

So the solution for this problem is to define the PATH variable before running lxc-destroy, e.g. like this:

21 09 * * * export PATH=$PATH:/sbin; lxc-stop -n nzzad-app02-test && lxc-destroy -n nzzad-app02-test

Now lxc-destroy ran perfectly.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder