Random sampling from a file

I recently learned about the Linux command line utility shuf from browsing The Art of Command Line. This could be useful for random sampling.

Given just a file name, shuf randomly permutes the lines of the file.

With the option -n you can specify how many lines to return. So it’s doing sampling without replacement. For example,

    shuf -n 10 foo.txt

would select 10 lines from foo.txt.

Actually, it would select at most 10 lines. You can’t select 10 lines without replacement from a file with less than 10 lines. If you ask for an impossible number of lines, the -n option is ignored.

You can also sample with replacement using the -r option. In that case you can select more lines than are in the file since lines may be reused. For example, you could run

    shuf -r -n 10 foo.txt

to select 10 lines drawn with replacement from foo.txt, regardless of how many lines foo.txt has. For example, when I ran the command above on a file containing

    alpha
    beta
    gamma

I got the output

    beta
    gamma
    gamma
    beta
    alpha
    alpha
    gamma
    gamma
    beta

I don’t know how shuf seeds its random generator. Maybe from the system time. But if you run it twice you will get different results. Probably.

Related

Bringing bash and PowerShell a little closer together

I recently ran across PSReadLine, a project that makes the PowerShell console act more like a bash shell. I’ve just started using it, but it seems promising. I’m switching between Linux and Windows frequently these days and it’s nice to have a little more in common between the two.

I’d rather write a PowerShell script than a bash script, but I’d rather use the bash console interactively. The PowerShell console is essentially the old cmd.exe console. (I haven’t kept up with PowerShell in a while, so maybe there have been some improvements, but it’s my impression that the scripting language has moved forward and the console has not.) PSReadLine adds some bash-like console conveniences such as Emacs-like editing at the command prompt.

Update: Thanks to Will for pointing out Clink in the comments. Clink sounds like it may be even better than PSReadLine.

PowerShell logo

RSS readers on Linux

This afternoon I asked on UnixToolTip for suggestions of RSS readers on Linux. Here are the suggestions I got, in order of popularity.

Update:

Some other readers available on Linux:

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

Ubuntu Made Easy

I like books from No Starch Press. (This isn’t some sort of paid endorsement; I don’t make any money from them. They give me books to review, but that’s kinda necessary if I’m going to review them.) Their books are fairly dense with technical content, but they also have a casual style and a sense of humor that makes them easier to read.

The latest book from No Starch is Ubuntu Made Easy: A Project-Based Introduction to Linux and it lives up to the expectations I have of No Starch books. It’s sort of a GUI counterpart to The Linux Command Line from the same publisher.

Ubuntu Made Easy is all about doing common tasks with Ubuntu. It’s primarily aimed at non-technical users, but programmers are often in the same boat as everyone else when they’re managing photos etc. rather than writing code and would find the book handy. It is primarily about Ubuntu specifically rather than Linux in general. In particular, it focuses on the current version, version 12.04, and its Unity user interface.

The book reads like the best books on how to use Windows or Mac, only for Ubuntu. By that I mean it has the level of polish and detail that I’ve more often seen in books written for those operating systems than in books written for Linux. I’d feel good about giving a copy of this book to someone who hasn’t used Linux.

There’s one part of the book that seemed a little out of place: Chapter 8, an introduction to the command line. Since this book is mostly about using the GUI and is aimed at a broad audience, some readers might be intimidated by this. If so, I hope they just skip over Chapter 8 since the rest of the book doesn’t depend much on it.

Related post: Why Food for the Hungry runs Ubuntu

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

The most brutal man page

In The Linux Command Line, the author describes the bash man page* as “the most brutal man page of them all.”

Many man pages are hard to read, but I think that the grand prize for difficulty has to go to the man page for bash. As I was doing my research for this book, I gave it a careful review to ensure that I was covering most of its topics. When printed, it’s over 80 pages long and extremely dense, and its structure makes absolutely no sense to a new user.

On the other hand, it is very accurate and concise, as well as being extremely complete. So check it out if you dare, and look forward to the day when you can read it and it all makes sense.

* If you’re not familiar with Unix lingo, “man” stands for “manual”. Man pages are online documentation.

Related post: Review of The Linux Command Line

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

Review: The Linux Command Line

No Starch Press recently released The Linux Command Line: A Complete Introduction (ISBN 1593273894) by William E. Shotts, Jr.

True to its name, the book is about using Linux from command line. It’s not an encyclopedia of Linux. It doesn’t explain how to install Linux, doesn’t go into system APIs, and says little about how to administer Linux. At the same time, the book is broader than just a book on bash. It’s about how to “live” at the command line.

The introduction explains the intended audience.

This book is for Linux users who have migrated from other platforms. Most likely you are a “power user” of some version of Microsoft Windows.

The book has a conversational style, explaining the motivation behind ways of working as well as providing technical detail. It includes small but very useful suggestions along the way, the kinds of tips you’d pick up from a friend but might not find in a book.

The book has four parts

  1. Learning the shell
  2. Configuration and the environment
  3. Common tasks and essential tools
  4. Writing shell scripts

The book could have just included the first three sections; the forth part is a bit more specialized than the others. If you’d prefer, think of the book has having three parts, plus a lengthy appendix on shell scripting.

The Linux Command Line is pleasant to read. It has a light tone, while also getting down to business.

Related post:
Perverse hipster desire for retro-computing

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

Unix tool tips

I’ve renamed my SedAwkTip twitter account to UnixToolTip to reflect its new scope. If you were following SedAwkTip, there’s no need to do anything. You’ll just see a different name.

I have about a week’s worth of sed and awk tips scheduled. Then I’ll start adding in tips on grep, find, uniq, etc. And I’ll come back to sed and awk now and then.

These tools came from the Unix world, but they’re also available on Windows.

For now I’m keeping the original icon. I’m open to suggestions if someone has an idea for a better icon.

s///

Related posts:

Command option patterns

Here are some common patterns in Unix command options. This is a summary of the patterns Eric Raymond describes here.

OptionTypical meaning
-aAll, append
-bBuffer,block size, batch
-cCommand, check
-dDebug, delete, directory
-DDefine
-eExecute, edit
-fFile, force
-hHeaders, help
-iInitialize
-IInclude
-kKeep, kill
-lList, long, load
-mMessage
-nNumber, not
-oOutput
-pPort, protocol
-qQuiet
-rRecurse, reverse
-sSilent, subject
-tTag
-uUser
-vVerbose
-VVersion
-wWidth, warning
-xEnable debugging, extract
-yYes
-zEnable compression

 

Why Food for the Hungry runs Ubuntu

Rick Richter is CIO of Food for the Hungry. In this interview Rick explains why his organization is moving all of its computers to Ubuntu.

Ethiopian farmer Ato Admasu

Ethiopian farmer Ato Admasu. Photo credit Food for the Hungry.

John: Tell me a little about Food for the Hungry and what you do there.

Rick: Food for the Hungry is a Christian relief and development organization. We go in to relief situations—maybe there has been a natural disaster or war—and provide life-sustaining needs: food, shelter, whatever the need may be. For example, the recent earthquake in Haiti. But the other part of what we do is the sustained, long-term development on the community level. The idea is to work with leaders and churches to better take care of themselves rather than relying on outside organizations for support.

I’m the CIO. I’m in charge of the information and technology for the organization. We’re in 25 countries. I have staff all over the world, about 25 people. There are about 12 who work directly for global IT, mostly in Phoenix, and the rest in various countries. There are also people who work directly for local offices, for example in Kenya, that coordinate with global IT. We’re responsible for about 900 computers.

John: You and I were talking the other day about your organization’s project to move all its computers over to Ubuntu.

Rick: We started an informal process to convert to Ubuntu two and a half years ago. It started when my son went to Bangladesh. He spent the summer there and converted some of their computers to Ubuntu. At first we didn’t have full management support for the process. They don’t really understand it and it scares them.

There were individual country directors interested in the project and I talked it up. There’s some independence in the organization to make those kind of decisions. Now, for the first time, we have full support of management for the conversion on a wide scale. I’m going to Cambodia next week. Right now they’re all running Windows but before I leave they’ll be running Ubuntu. In Asia we probably have about 80% of our computers on Ubuntu. We don’t have big offices in Asia. Our bigger offices are in Africa and they’re a little slower to adopt. Until now, a lot of it depended on whether the local country director was ready to change.

We found it was important for a number of reasons. One is security. Linux is not as vulnerable to viruses. We have so many places where entire computer systems have been totally crippled because of viruses. A lot of networks are very primitive, so the network is basically a thumb drive between offices in a country. A thumb drive is the best way to transmit viruses you can find.

We’ve also found in the last few years anti-virus software has become less and less effective. Three or four years ago, if you had up-to-date anti-virus software you wouldn’t get a virus. These days, you still get them. Some of our staff have other jobs within FH besides their IT responsibilities and may not have a lot of IT experience. As a result, staff often do not have the time to pro-actively manage IT.

Another issue is maintainability. Windows computers don’t run as well over time. With Ubuntu, when we come back to a computer two years later it’s in as good a shape as we left it.

Linux requires much less hardware to run than Windows. We have eight- or nine-year-old computers at a lot of our sites that will no longer run or barely run Windows.

John: So saving money on software licenses is a benefit, but not the main consideration.

Rick: Saving money on licenses is important, but it’s not the driving force. We’re a non-profit and we have a contract with Microsoft where we get pretty good prices.

Another reason for moving to Ubuntu is that in some countries it is very difficult to legally obtain licenses. Sometimes it’s next to impossible. You can’t buy legal Microsoft licenses in some places, or if you can, the price is outrageous. So many legalities and so many weird hoops you have to jump through.

As a Christian organization we need to set a good example and make sure all our licenses are legal. We want to be clear and up-front about our software. Ubuntu eliminates that problem.

John: What experience have you had retraining your IT people to support Linux?

Rick: We have IT professionals and we have people who are much less skilled. Most of the IT people who do the support have really bought into it. They’re excited about it and they’re pushing it. Those who do support in the field who have had less exposure, some of them have bought into it, some have not as much. It requires time. It requires dedication. It also required commitment from their management.

Related posts:

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon