Arrrrgghh!
This is pretty much the summary of my research for the last couple of days. For several days now I have a weird behavior of Apache where suddenly the load increases and some Apache child processes use up to 100% of the CPU.
Top shows that there are 3 Apache processes which use the most % of CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5439 www-data 20 0 626m 170m 64m S 76 4.3 51:43.32 apache2
4355 www-data 20 0 680m 173m 58m S 43 4.4 34:08.04 apache2
3522 www-data 20 0 630m 205m 64m S 39 5.2 39:03.98 apache2
If we take a detailed look of open connections by using the lsof command, we can see the following:
# lsof -i :80
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
apache2 3522 www-data 26u IPv6 1210315466 TCP server:www->crawl-66-249-71-78.googlebot.com:54107 (CLOSE_WAIT)
apache2 4355 www-data 39u IPv6 1210322237 TCP server:www->crawl-66-249-66-136.googlebot.com:36722 (CLOSE_WAIT)
apache2 5439 www-data 26u IPv6 1210335205 TCP server:www->crawl-66-249-66-136.googlebot.com:62305 (CLOSE_WAIT)
apache2 5439 www-data 30u IPv6 1210345350 TCP server:www->crawl-66-249-66-136.googlebot.com:40885 (CLOSE_WAIT)
apache2 13904 www-data 3u IPv6 1210044289 TCP *:www (LISTEN)
apache2 14633 www-data 3u IPv6 1210044289 TCP *:www (LISTEN)
apache2 14633 www-data 28u IPv6 1210440119 TCP server:www->195.188.250.137:17518 (ESTABLISHED)
apache2 16314 root 3u IPv6 1210044289 TCP *:www (LISTEN)
Surprise, surprise. We find the same processes found in the top output again. And we also see that they're not listening to new http connections anymore (meanwhile 3 new child processes were spawned). But the old processes are still open due to a CLOSE_WAIT status between Apache and the Googlebot.
The problem now is: What can I (and anyone else who experiences this problem) do? By definition a CLOSE_WAIT means that the remote side has closed the connection, but the local process still kept it open. Why does it only happen with Googlebot (which could prove an improper CLOSE from the remote side)?
If anyone has a solution for that problem, please let me know. And no, blocking Googlebot is not an option.
As of now the only temporary solution is to kill the affected child processes. This is not dangerous since all other http connections are managed by the new spawned processes, but it is not nice (remember, killing is not nice).
Update February 7th 2011: I was able to identify the reason and solve this, see Googlebot and Apache CLOSE_WAIT's: SOLVED!
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder