Splitting lines and numbering the pieces

As I mentioned in my computational survivalist post, I’m working on a project where I have a dedicated computer with little more than basic Unix tools, ported to Windows. It’s given me new appreciation for how the standard Unix tools fit together; I’ve had to rely on them for tasks I’d usually do a different way.

I’d seen the nl command before for numbering lines, but I thought “Why would you ever want to do that? If you want to see line numbers, use your editor.” That way of thinking looks at the tools one at a time, asking what each can do, rather than thinking about how they might work together.

Today, for the first time ever, I wanted to number lines from the command line. I had a delimited text file and wanted to see a numbered list of the column headings. I’ve written before about how you can extract columns using cut, but you have to know the number of a column to select it. So it would be nice to see a numbered list of column headings.

The data I’m working on is proprietary, so I downloaded a PUMS (Public Use Microdata Sample) file named ss04hak.csv from the US Census to illustrate instead. The first line of this file is

RT,SERIALNO,DIVISION,MSACMSA,PMSA,PUMA,REGION,ST,ADJUST,WGTP,NP,TYPE,ACR,AGS,BDS,BLD,BUS,CONP,ELEP,FULP,GASP,HFL,INSP,KIT,MHP,MRGI,MRGP,MRGT,MRGX,PLM,RMS,RNTM,RNTP,SMP,TEL,TEN,VACS,VAL,VEH,WATP,YBL,FES,FINCP,FPARC,FSP,GRNTP,GRPIP,HHL,HHT,HINCP,HUPAC,LNGI,MV,NOC,NPF,NRC,OCPIP,PSF,R18,R65,SMOCP,SMX,SRNT,SVAL,TAXP,WIF,WKEXREL,FACRP,FAGSP,FBDSP,FBLDP,FBUSP,FCONP,FELEP,FFSP,FFULP,FGASP,FHFLP,FINSP,FKITP,FMHP,FMRGIP,FMRGP,FMRGTP,FMRGXP,FMVYP,FPLMP,FRMSP,FRNTMP,FRNTP,FSMP,FSMXHP,FSMXSP,FTAXP,FTELP,FTENP,FVACSP,FVALP,FVEHP,FWATP,FYBLP

I want to grab the first line of this file, replace commas with newlines, and number the results. That’s what the following one-liner does.

    head -n 1 ss04hak.csv | sed "s/,/\n/g" | nl

The output looks like this:

     1  RT 
     2  SERIALNO 
     3  DIVISION  
     4  MSACMSA
     5  PMSA
...
   100  FWATP
   101  FYBLP

Now if I wanted to look at a particular field, I could see the column number without putting my finger on my screen and counting. Then I could use that column number as an argument to cut -f.

2 thoughts on “Splitting lines and numbering the pieces

  1. You can make this slightly easier to type (and understand – at least for people familiar with the tools) by using tr rather than sed to turn commas into newlines. Although in complete contradiction of that philosophy, I think that the few times I’ve needed to number lines I’ve used perl’s $.

  2. in this case things look to be non-spaced between the commas. so you could also do ‘head -n 1 | tr ‘,’ ‘ ‘ | xargs -n 1 | cat -n’ if you want it in a more compact form; ‘!! | paste – – – – – – | column -t’

Leave a Reply

Your email address will not be published. Required fields are marked *