Regex != regex in sed (or: replacing digits in sed)

Written by - 0 comments

Published on - Listed in Shell Linux Coding


This is supposed to be a quick reminder to myself, the next time I run into such a problem: regular expressions are not exactly the same in sed!

On my previous article "How to manually clean up Zoneminder events" I wrote a shell script in which I wanted to remove a certain part of a path:

/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12

should become:

/var/cache/zoneminder/events/5/18/12/14/06/45/12

Simple, right? Just use sed replace and remove ".448512/" out of the string.

But see for yourself:

$ echo "/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12" | sed "s/\.\d+\///g"
/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12

The old path is still shown. Nothing was replaced. My first thought was of course that I've made a mistake in my regular expression, but on all the regex checkers online confirmed my regex was correct. For example on https://regexr.com/:

Regex match dot and digit

I was able to break it down that it must have something to do with the regular expression for the number (\d+) because simply replacing the dot character works:

$ echo "/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12" | sed "s/\.//g"
/var/cache/zoneminder/events/5/18/12/14/448512/06/45/12

And then I received the final hint from a friend: Some typical regex don't work in sed! Excerpt from sed's documentation:

*    Matches a sequence of zero or more instances of matches for the preceding regular expression, which must be an ordinary character, a special character preceded by \, a ., a grouped regexp (see below), or a bracket expression. As a GNU extension, a postfixed regular expression can also be followed by *; for example, a** is equivalent to a*. POSIX 1003.1-2001 says that * stands for itself when it appears at the start of a regular expression or subexpression, but many nonGNU implementations do not support this and portable scripts should instead use \* in these contexts.

\+   As *, but matches one or more. It is a GNU extension. 

[...]

‘[a-zA-Z0-9]’  In the C locale, this matches any ASCII letters or digits.

So first of all the plus-sign (+) must be escaped. And second to match a digit, \d doesn't work, it must be used in [0-9] style!

With these adjustments, sed now finally does the replace part:

$ echo "/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12" | sed "s/\.[0-9]\+\///g"
/var/cache/zoneminder/events/5/18/12/14/06/45/12

Dang it, I am sure that I ran into this at least once already in my Linux career. Hence this post to not lose much time the next time this happens again.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder