Read or count number of lines of an Excel xlsx file in Linux on the cli

Written by - 2 comments

Published on - Listed in Linux Perl Shell Windows Coding Office


I needed to figure out, how many lines an Excel XSLX file contained, in order to use the number of lines for a division by a second number. The result of the division would then decide, how often a script needs to run.

So to write it down more clearly:

$numlinesExcelSheet / $divisor = $result
i=0
while [ $i -lt $result ] ; do /tmp/runscript.sh; let i++; done

Got it? OK!

But how am I able to find out the number of lines of the Excel sheet? As it's common knowledge, the xlsx file is a binary file, so I cannot use a simple command as "wc".

On my research I came across two possibilities (there are probably even more), which are easily installed and deployed. I will explain how to use them and how to quickly read the number of lines of the Excel sheet.

But first, let's create an XLSX file with some basic content:

XLSX File

The Perl way

I came across the CPAN perl module Spreadsheet::XLSX, which seems to do the job. As a requirement I needed to install the perl module Text::Iconv (perl-Text-Iconv package on openSUSE and libtext-iconv-perl in Debian/Ubuntu). Furthermore I installed the Spreadsheet::XLSX perl module through cpan:

cpan[1]> install Spreadsheet::XLSX

By running the example script, shown in the synopsis, the output looks like this:

# perl xlsx.pl
Sheet: Sheet1
( 0 , 0 ) => sfd
( 1 , 0 ) => fdasfd
( 2 , 0 ) => dsfafd
( 3 , 0 ) => dfadfw
( 4 , 0 ) => afda
( 5 , 0 ) => asdf
( 6 , 0 ) => test

Only bummer here is that the first row is counted in a typical array manner, starting at 0. So when I use the $sheet -> {MaxRow} variable, it actually outputs 6, instead of 7 as seen in the screenshot.

To handle this, I slightly modified the perl script and simply use $sheet -> {MaxRow} + 1:

# cat test.pl
#!/bin/perl
use Text::Iconv;
 my $converter = Text::Iconv -> new ("utf-8", "windows-1251");

 # Text::Iconv is not really required.
 # This can be any object with the convert method. Or nothing.

 use Spreadsheet::XLSX;

 my $excel = Spreadsheet::XLSX -> new ('test.xlsx', $converter);

 foreach my $sheet (@{$excel -> {Worksheet}}) {

        printf("Number of lines: %d\n", $sheet -> {MaxRow} + 1);

 }

# perl test.pl
Number of lines: 7

The Bash / Shell way

In Bash there is, to my knowledge, no module or extension which would be able to directly read the contents of an Excel file. But there is however an interesting package which can be installed (at least on Debian and Ubuntu): xlsx2csv.

With this tool, an Excel xlsx file can be converted into a CSV. A CSV is a normal text file again and therefore the number of lines can simply be read:

# xlsx2csv test.xlsx test.csv

# cat test.csv
sfd
fdasfd
dsfafd
dfadfw
afda
asdf
test

# cat test.csv  | wc -l
7

What now?

Both ways are working, that's the good news! The perl way seems to be faster to me (because the conversion from xlsx to csv and then read the lines takes longer) but I got into problems trying to read the number of lines on a large Excel file (26MB). The following errors were shown thousands of times:

Use of uninitialized value $t in concatenation (.) or string at /usr/lib/perl5/site_perl/5.20.1/Spreadsheet/XLSX.pm line 49.

Eventually, after running for more than 5minutes, the process was killed:

# time perl test.pl
[...]
Use of uninitialized value $t in concatenation (.) or string at /usr/lib/perl5/site_perl/5.20.1/Spreadsheet/XLSX.pm line 49.
Killed

real    5m12.927s
user    2m49.066s
sys    0m29.516s

On the other hand, the bash way was also working fine with the same large Excel file:

# time xlsx2csv test2.xlsx test2.csv; time wc -l test2.csv

real    1m1.378s
user    1m1.133s
sys    0m0.231s
170768 test2.csv

real    0m0.018s
user    0m0.001s
sys    0m0.017s

 So for my scenario I'll therefore choose the xlsx2csv command.  


Add a comment

Show form to leave a comment

Comments (newest first)

Claudio from Miami, USA wrote on Feb 19th, 2018:

I'm not sure how the counting works with several sheets in the same file. You'd have to test this. Xls2csv is a package you can easily install in Debian and Ubuntu. I think it might work with xls files,too, but I didn't test this. Just go ahead and test and let me know here.


Ahfeas from Chennai wrote on Feb 19th, 2018:

What if the Excel has more than one sheets in it.
Will xlsx2csv works ?
Please let me know, Is xlsx is a package that needs to be installed or its a command ?
And further will it work with xls ?


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder