I needed to figure out, how many lines an Excel XSLX file contained, in order to use the number of lines for a division by a second number. The result of the division would then decide, how often a script needs to run.
So to write it down more clearly:
$numlinesExcelSheet / $divisor = $result
i=0
while [ $i -lt $result ] ; do /tmp/runscript.sh; let i++; done
Got it? OK!
But how am I able to find out the number of lines of the Excel sheet? As it's common knowledge, the xlsx file is a binary file, so I cannot use a simple command as "wc".
On my research I came across two possibilities (there are probably even more), which are easily installed and deployed. I will explain how to use them and how to quickly read the number of lines of the Excel sheet.
But first, let's create an XLSX file with some basic content:
The Perl way
I came across the CPAN perl module Spreadsheet::XLSX, which seems to do the job. As a requirement I needed to install the perl module Text::Iconv (perl-Text-Iconv package on openSUSE and libtext-iconv-perl in Debian/Ubuntu). Furthermore I installed the Spreadsheet::XLSX perl module through cpan:
cpan[1]> install Spreadsheet::XLSX
By running the example script, shown in the synopsis, the output looks like this:
# perl xlsx.pl
Sheet: Sheet1
( 0 , 0 ) => sfd
( 1 , 0 ) => fdasfd
( 2 , 0 ) => dsfafd
( 3 , 0 ) => dfadfw
( 4 , 0 ) => afda
( 5 , 0 ) => asdf
( 6 , 0 ) => test
Only bummer here is that the first row is counted in a typical array manner, starting at 0. So when I use the $sheet -> {MaxRow} variable, it actually outputs 6, instead of 7 as seen in the screenshot.
To handle this, I slightly modified the perl script and simply use $sheet -> {MaxRow} + 1:
# cat test.pl
#!/bin/perl
use Text::Iconv;
my $converter = Text::Iconv -> new ("utf-8", "windows-1251");
# Text::Iconv is not really required.
# This can be any object with the convert method. Or nothing.
use Spreadsheet::XLSX;
my $excel = Spreadsheet::XLSX -> new ('test.xlsx', $converter);
foreach my $sheet (@{$excel -> {Worksheet}}) {
printf("Number of lines: %d\n", $sheet -> {MaxRow} + 1);
}
# perl test.pl
Number of lines: 7
The Bash / Shell way
In Bash there is, to my knowledge, no module or extension which would be able to directly read the contents of an Excel file. But there is however an interesting package which can be installed (at least on Debian and Ubuntu): xlsx2csv.
With this tool, an Excel xlsx file can be converted into a CSV. A CSV is a normal text file again and therefore the number of lines can simply be read:
# xlsx2csv test.xlsx test.csv
# cat test.csv
sfd
fdasfd
dsfafd
dfadfw
afda
asdf
test
# cat test.csv | wc -l
7
What now?
Both ways are working, that's the good news! The perl way seems to be faster to me (because the conversion from xlsx to csv and then read the lines takes longer) but I got into problems trying to read the number of lines on a large Excel file (26MB). The following errors were shown thousands of times:
Use of uninitialized value $t in concatenation (.) or string at /usr/lib/perl5/site_perl/5.20.1/Spreadsheet/XLSX.pm line 49.
Eventually, after running for more than 5minutes, the process was killed:
# time perl test.pl
[...]
Use of uninitialized value $t in concatenation (.) or string at /usr/lib/perl5/site_perl/5.20.1/Spreadsheet/XLSX.pm line 49.
Killed
real 5m12.927s
user 2m49.066s
sys 0m29.516s
On the other hand, the bash way was also working fine with the same large Excel file:
# time xlsx2csv test2.xlsx test2.csv; time wc -l test2.csv
real 1m1.378s
user 1m1.133s
sys 0m0.231s
170768 test2.csv
real 0m0.018s
user 0m0.001s
sys 0m0.017s
So for my scenario I'll therefore choose the xlsx2csv command.
Claudio from Miami, USA wrote on Feb 19th, 2018:
I'm not sure how the counting works with several sheets in the same file. You'd have to test this. Xls2csv is a package you can easily install in Debian and Ubuntu. I think it might work with xls files,too, but I didn't test this. Just go ahead and test and let me know here.
Ahfeas from Chennai wrote on Feb 19th, 2018:
What if the Excel has more than one sheets in it.
Will xlsx2csv works ?
Please let me know, Is xlsx is a package that needs to be installed or its a command ?
And further will it work with xls ?
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder