Today I needed to know why a certain process (81113) suddenly increases its memory. One of my guesses was that it could have something to do with a NFS mount where the relevant process sometimes needs to grab data from. I wanted to know exactly how long it takes to open a file from the NFS server. And here's how I did it.
First I launched a basic dtrace system call looking for files opened by PID 81113.
dtrace -n 'syscall::open:entry {printf ("at %d on %s", timestamp, copyinstr(arg0))}' -p 81113
20 11197 open:entry at 13397986839993578 on /opt/datasite/modules/class/location.php
20 11197 open:entry at 13397986844799339 on /opt/datasite/modules/class/Navigation/Trans/Filter.class.php
20 11197 open:entry at 13397986844917220 on /opt/datasite/modules/class/Navigation/Trans/Auth.class.php
20 11197 open:entry at 13397986850024255 on /opt/datasite/modules/class/Navigation/Trans/Conditions.class.php
23 11197 open:entry at 13397987105896708 on /opt/datasite/include/locale.js.php
As additional information I printed out the current timestamp (in nanoseconds). This output gave me information about which file was opened at which time.
Thanks to the timestamp information it is possible to find out how long it took until the file was opened/accessed. To do that, it is necessary to combine syscall::open:entry (the request) and syscall::open:return (the return) and take both timestamps (once at the entry and then at the return):
dtrace -n 'syscall::open:entry {self->start = timestamp; self->file = copyinstr(arg0)} syscall::open:return {printf("%d ns on %s", timestamp - self->start, self->file); self->start = 0;}' -p 81113
8 11198 open:return 9402 ns on /opt/datasite/functions/Mail.function.php
8 11198 open:return 9733 ns on /opt/datasite/functions/process.php
8 11198 open:return 9145 ns on /opt/datasite/include/soap.function.php
8 11198 open:return 1039952 ns on /mounts/cache/blog/static/index.html
As you can see, there are two syscall providers launched within the same dtrace command. At execution of the first provider the current timestamp is placed in a variable (self->start) and the filename with its full path is also set into a variable (self->file).
The second syscall provider is then launched and this time there's actually the output (printf) of the values. Instead of just returning the second timestamp, a subtraction of timestamps took place (current timestamp - self->start = time difference). At the end self->start is reset to 0 again.
To filter only the very slow access times, it is possible to create an "if statement" to the dtrace command. In this case I'd like to see only the lines where the access time was longer than 100000ns:
dtrace -n 'syscall::open:entry {self->start = timestamp; self->file = copyinstr(arg0)} syscall::open:return /(timestamp - self->start) > 100000/ {printf("%d ns on %s", timestamp - self->start, self->file); self->start = 0;}' -p 81113
23 11198 open:return 208195 ns on /mounts/cache/blog/static/image/bla2.jpg
23 11198 open:return 200908 ns on /mounts/cache/blog/static/css/stylesheet.css
23 11198 open:return 199587 ns on /mounts/cache/blog/include/test.php
23 11198 open:return 199146 ns on /mounts/cache/blog/static/image/bla.jpg
Note the following part right after the second syscall provider is launched: /(timestamp - self->start) > 100000/
This final output can now be used to find bottlenecks or very slow file systems or mount points.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder