Been pulling my hair out (what's left of it) because a script which was working fine when I launched it manually, but the very same script did not work when it was run by cron. I narrowed it down to the "lxc-destroy" command which simply does not do anything when run by cron. All this on an Ubuntu 14.04 LTS with LXC 1.0.7.
I was able to reproduce it with a simple cronjob like this:
03 08 * * * lxc-stop -n mycontainer && lxc-destroy -n mycontainer
What happened right after the cronjob ran? The container was correctly stopped - but remained stopped. The container was still showing up in lxc-ls (in state STOPPED). I suspected there is a problem deleting the logical volume (lxc.rootfs) of the container when lxc-destroy is launched, because a quick "lvs" after the cronjob ran through still showed the LV of the container.
Now I ran the same commands with strace to see more:
41 08 * * * strace -s 1000 -ff -o /var/log/stop.log lxc-stop -n nzzad-app01-test && strace -s 1000 -ff -o /var/log/destroy.log lxc-destroy -n nzzad-app01-test
This actually created several log files (each child process got its own log file):
-rw-r--r-- 1 root root 2401 Sep 25 08:41 destroy.log.23100
-rw-r--r-- 1 root root 2401 Sep 25 08:41 destroy.log.23101
-rw-r--r-- 1 root root 5180 Sep 25 08:41 destroy.log.23102
-rw-r--r-- 1 root root 376 Sep 25 08:41 destroy.log.23103
-rw-r--r-- 1 root root 61340 Sep 25 08:41 destroy.log.23099
The relevant part of the parent PID (23099) showed:
stat("/var/lib/lxc/mycontainer/config", {st_mode=S_IFREG|0644, st_size=808, ...}) = 0
pipe([3, 4]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23100
close(4) = 0
read(3, "\377\377\377\377", 4) = 4
close(3) = 0
wait4(23100, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23100
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23100, si_status=1, si_utime=0, si_stime=0} ---
socket(PF_LOCAL, SOCK_STREAM, 0) = 3
connect(3, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 40) = -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 110) = -1 ECONNREFUSED (Connection refused)
close(3) = 0
stat("/var/lib/lxc/mycontainer/config", {st_mode=S_IFREG|0644, st_size=808, ...}) = 0
open("/run/lxc/lock//var/lib/lxc/mycontainer", O_RDWR|O_CREAT, 0600) = 3
fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
pipe([4, 5]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23101
close(5) = 0
read(4, "\377\377\377\377", 4) = 4
close(4) = 0
wait4(23101, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23101
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23101, si_status=1, si_utime=0, si_stime=0} ---
socket(PF_LOCAL, SOCK_STREAM, 0) = 4
connect(4, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 40) = -1 ECONNREFUSED (Connection refused)
connect(4, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 110) = -1 ECONNREFUSED (Connection refused)
close(4) = 0
open("/var/lib/lxc/mycontainer/lxc_snapshots", O_RDONLY) = -1 ENOENT (No such file or directory)
geteuid() = 0
pipe2([4, 5], O_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23102
close(5) = 0
fcntl(4, F_GETFL) = 0 (flags O_RDONLY)
fstat(4, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8da4aff000
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
read(4, "", 4096) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23102, si_status=127, si_utime=0, si_stime=0} ---
close(4) = 0
munmap(0x7f8da4aff000, 4096) = 0
wait4(23102, [{WIFEXITED(s) && WEXITSTATUS(s) == 127}], 0, NULL) = 23102
stat("/dev/vglxc/mycontainer", {st_mode=S_IFBLK|0660, st_rdev=makedev(252, 0), ...}) = 0
open("/sys/dev/block/252:0/dm/uuid", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8da4aff000
read(4, "LVM-AGQfFleAZmtwXaRlf3qusphA68nyKV56qCiz847f5Jd9gR5Em5pzaJdSfo24s4br\n", 4096) = 69
close(4) = 0
munmap(0x7f8da4aff000, 4096) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23103
wait4(23103, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23103
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23103, si_status=1, si_utime=0, si_stime=0} ---
write(2, "lxc_container: ", 15) = 15
write(2, "lxccontainer.c: lxcapi_destroy: 2073 ", 37) = 37
write(2, "Error destroying rootfs for mycontainer", 44) = 44
write(2, "\n", 1) = 1
fcntl(3, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(3) = 0
write(2, "Destroying mycontainer failed\n", 35) = 35
exit_group(1) = ?
+++ exited with 1 +++
The more interesting part can be found in the log file of the last child pid (23103) which is responsible for the deletion of the logical volume:
set_robust_list(0x7f8da4af2ba0, 24) = 0
execve("/usr/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 7 vars */]) = -1 ENOENT (No such file or directory)
execve("/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 7 vars */]) = -1 ENOENT (No such file or directory)
exit_group(1) = ?
+++ exited with 1 +++
Now I decided to run the exacte same lxc-destroy command "manually" on the command line, also using strace to log and catch a difference:
strace -s 1000 -ff -o /var/log/manualdestroy.log lxc-destroy -n mycontainer
And indeed! In the log file of the last child process I found this:
set_robust_list(0x7fde2db25ba0, 24) = 0
execve("/usr/local/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/local/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = 0
brk(0) = 0x1a48000
I don't have to be a C programmer to understand what happens. When the command was launched by cron, the chlid process trying to remove the logical volume was looking for the lvremove command in /usr/bin and in /bin. When it didn't find it there, it exited with an error.
When I ran it manually, the same command was not only looking in /usr/bin and /bin, but also in /usr/local/sbin, /usr/local/bin, /usr/sbin and finally in /sbin where it found the lvremove command and continued.
But why is there a difference in which directories commands should be looked for? The answer is PATH.
The $PATH variable of root when run on the cli:
echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Now to see the $PATH variable of cron, I created a cronjob like this:
58 08 * * * echo $PATH > /var/log/path.log
And now let's take a look at /var/log/path.log:
cat /var/log/path.log
/usr/bin:/bin
Here we have the difference - and the reason why lxc-destroy does not work when run by cron.
So the solution for this problem is to define the PATH variable before running lxc-destroy, e.g. like this:
21 09 * * * export PATH=$PATH:/sbin; lxc-stop -n nzzad-app02-test && lxc-destroy -n nzzad-app02-test
Now lxc-destroy ran perfectly.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder