Exporting Excel files to CSV with in2csv

This post shows how to export an Excel file to a CSV file using in2csv from the csvkit package.

You could always use Excel itself to export an Excel file to CSV but there are several reasons you might not want to. First and foremost, you might not have Excel. Another reason is that you might want to work from the command line in order to automate the process. Finally, you might not want the kind of CSV format that Excel exports.

For illustration I made a tiny Excel file. In order to show how commas are handled, populations contain commas but areas do not.

See post content for text dump of data

When I ask Excel to export the file I get

    State,Population,Area
    CA,"39,500,000",163695
    TX,"28,300,000",268596
    FL,"31,000,000",65758

Note that areas are exported as plain integers, but populations are exported as quoted strings containing commas.

Using csvkit

Now install csvkit and run in2csv.

    $ in2csv states.xlsx
    State,Population,Area
    CA,39500000,163695
    TX,28300000,268596
    FL,31000000,65758

The output goes to standard out, though of course you could redirect the output to a file. All numbers are exported as numbers, with no thousands separators. This makes the output easier to use from a program that does crude parsing [1]. For example, suppose we save states.xlsx to states.csv using Excel then ask cut for the second column. Then we don’t get what we want.

    $ cut -d, -f2 states.csv
    Population
    "39
    "28
    "31

But if we use in2csv to create states.csv then we get what we’d expect.

    cut -d, -f2 states.csv
    Population
    39500000
    28300000
    31000000

Multiple sheets

So far we’ve assumed our Excel file has a single sheet. I added a second sheet with data on US territories. The sheet doesn’t have a header row just to show that the header row isn’t required.

PR\t3,300,000\t5325\nGuam\t161,700\t571

I named the two sheets “States” and “Territories” respectively.

States, Territories

Now if we ask in2csv to export our Excel file as before, it only exports the first sheet. But if we specify the Territories sheet, it will export that.

    $ in2csv --sheet Territories states.xlsx
    PR,3300000,5325
    Guam,161700,571

More CSV posts

[1] The cut utility isn’t specialized for CSV files and doesn’t understand that commas inside quotation marks are not field separators. A specialized utility like csvtool would make this distinction. You could extract the second column with

    csvtool col 2 states.csv

or

    csvtool namedcol Population states.csv

This parses the columns correctly, but you still have to remove the quotation marks and thousands separators.