Uncategorized

My most popular posts on Reddit

There are only three posts on this top 10 list that are also on the top 10 list for Hacker News.

The valley of medium reliability

Last evening my electricity went out and this morning it was restored. This got met thinking about systems that fail occasionally [1]. Electricity goes out often enough that we prepare for it. We have candles and flashlights, my work computer is on a UPS, etc.

A residential power outage is usually just an inconvenience, especially if the power comes back on within a few hours. A power outage to a hospital could be disastrous, and so hospitals have redundant power systems. The problem is in between, if power is reliable enough that you don’t expect it to go out, but the consequences of an outage are serious [2].

If a system fails occasionally, you prepare for that. And if it never fails, that’s great. In between is the problem, a system just reliable enough to lull you into complacency.

Dangerously reliable systems

For example, GPS used to be unreliable. It made useful suggestions, but you wouldn’t blindly trust it. Then it got a little better and became dangerous as people trusted it when they shouldn’t. Now it’s much better. Not perfect, but less dangerous.

For another example, people who live in flood planes have flood insurance. Their mortgage company requires it. And people who live on top of mountains don’t need flood insurance. The people at most risk are in the middle. They live in an area that could flood, but since it hasn’t yet flooded they don’t buy flood insurance.

flooded park

So safety is not an increasing function of reliability, not always. It might dip down before going up. There’s a valley between unreliable and highly reliable where people are tempted to take unwise risks.

Artificial intelligence risks

I expect we’ll see a lot of this with artificial intelligence. Clumsy AI is not dangerous; pretty good AI is dangerous. Moderately reliable systems in general are dangerous, but this especially applies to AI.

As in the examples above, the better AI becomes, the more we rely on it. But there’s something else going on. As AI failures become less frequent they also become weird.

Adversarial attacks

You’ll see stories of someone putting a tiny sticker on a stop sign and now a computer vision algorithm thinks the stop sign is a frog or an ice cream sundae. In this case, there was a deliberate attack: someone knew how to design a sticker to fool the algorithm. But strange failures can also happen unprompted.

Unforced errors

Amazon’s search feature, for example, is usually very good. Sometimes I’ll get every word in a book title wrong and yet it will figure out what I meant. But one time I was searching for the book Universal Principles of Design.

I thought I remembered a “25” in the title. The subtitle turns out to be “125 ways to enhance reliability …” I searched on “25 Universal Design Principles” and the top result was a massage machine that will supposedly increase the size of a woman’s breasts. I tried the same search again this morning. The top result is a book on design. The next five results are

  1. a clip-on rear view mirror
  2. a case of adult diapers
  3. a ratchet adapter socket
  4. a beverage cup warmer, and
  5. a folding bed.

The book I was after, and whose title I remembered pretty well, was nowhere in the results.

Because AI is literally artificial, it makes mistakes no human would make. If I went to a brick-and-mortar book store and told a clerk “I’m looking for a book. I think the title is something like ’25 Universal Design Principles,” the clerk would not say “Would you like to increase your breast size? Or maybe buy a box of diapers?”

In this case, the results were harmless, even entertaining. But unexpected results in a mission-critical system would not be so entertaining. Our efforts to make systems fool-proof has been based on experience with human fools, not artificial ones.

[1] This post is an elaboration on what started as a Twitter thread.

[2] I’m told that in Norway electrical power is very reliable, but also very dependent on electricity, including for heating. Alternative sources of fuel such as propane are hard to find.

How to keep unwanted content out of your Twitter stream

How do you keep things you don’t want out of your Twitter stream? You might say just don’t follow people who post things you don’t want to read, but it’s not that simple.

Some people post worthwhile original material, but they retweet things that are offensive or just not interesting. You can fix that by turning off retweets from that person. Then you’ll just see tweets they compose.

Except until yesterday, there was no way to turn off “likes.” You’d randomly see things someone “liked” even if you turned off their retweets. Now there’s a way to simply see the content you’ve subscribed to. Not only that, you’ll see it in order! Numerous times I’ve tried to go back and find something but couldn’t because Twitter saw fit to edit and rearrange my stream since the last time I looked at it.

The way to simply see your Twitter stream in order isn’t obvious. You have to go to

Settings and privacy -> Account

and uncheck the box that says “Show the best Tweets first.”

Timeline: Show the best Tweets first

Who wouldn’t want to see the best tweets first? Sounds good to me. But by unchecking the box you’re effectively saying “Let me decide what’s best by who I choose to follow.”

I’m pleased by this new feature (actually, new ability to turn off a feature). I’ve tried to maintain a decent signal to noise ratio in my Twitter stream and Twitter has continually tried to erode it, until now.

Density of the Great Pacific Garbage Patch

The Great Pacific Garbage Patch (GPGP) is a huge region of ocean trash twice the area of Texas. I’m trying to understand how dense it is, and running into contradictory information.

This article describes a project, Ocean Cleanup, that aims to clean up half the GPGP in five years. How could you possibly clean up a garbage patch bigger than Texas in five years? That made me suspect the GPGP isn’t as dense a garbage patch I imagined, and it’s not.

The article mentioned above says Ocean Cleanup would remove 5.5 metric tons of trash a month, and clean up half the GPGP in five years [1]. (I hope they can!) That implies the GPGP contains 660 metric tons of trash. Wikipedia says it contains 80,000 metric tons of trash. Somebody is off by two orders of magnitude! If Wikipedia is right about the mass, and if Ocean Cleanup is right that they can remove half of it in five years, then they’ll have to remove 700 tons of trash per month.

Not exactly a garbage patch

The Wikipedia article on the GPGP does say that “garbage patch” is misleading.

There has been some controversy surrounding the use of the term “garbage patch” and photos taken off the coast of Manila in the Philippines in attempts to portray the patch in the media often misrepresenting the true scope of the problem and what could be done to solve it. Angelicque White, Associate Professor at Oregon State University, who has studied the “garbage patch” in depth, warns that “the use of the phrase ‘garbage patch’ is misleading. … It is not visible from space; there are no islands of trash; it is more akin to a diffuse soup of plastic floating in our oceans.”

Density

So how dense is it? Let’s assume 80,000 metric tons over an area twice the size of Texas. The area of Texas is 700,000 km² , so that’s 8 × 1010 grams of trash over 1.4 × 1012 square meters, or 57 milligrams per square meter.

An empty water bottle weighs about 20 grams, and an American football field covers 5300 square meters, so this would be the same density of plastic as 15 empty water bottles scattered over a football field. This is an average. No doubt the density is higher in some areas and lower in others.

***

[1] The video in the article says Ocean Cleanup would remove half the GPGP every five years, implying that the rate of clean up will decline exponentially.

Enough group theory for now

I’ve written three blog posts lately about the classification of finite simple groups. I’m done with that topic for now. I may come back and revisit it in the future. So if group theory isn’t your favorite topic, don’t worry. I don’t know what I’ll blog about next, but it’ll probably be one of the topics I often write about.

 

Optimal amount of input

If you don’t get any outside input into your life, you’re literally an idiot, someone in your own little world. But if you get too much outside input, you become a bland cliche. I’ve written about this a couple times, and ran across a new post this morning from someone expressing a similar idea.

I first wrote about this in a short post on noise removal. After talking about signal processing, I wax philosophical.

This is a metaphor for life. If you only value your own opinion, you’re an idiot in the oldest sense of the word, someone in his or her own world. Your work may have a strong signal, but it also has a lot of noise. Getting even one outside opinion greatly cuts down on the noise. But it also cuts down on the signal to some extent. If you get too many opinions, the noise may be gone and the signal with it. Trying to please too many people leads to work that is offensively bland.

I returned to this theme in The Opposite of an Idiot:

An idiot lives only in his own world; the opposite of an idiot has no world of his own.

This morning I found a new post along these lines via Tyler Cowen. He links to a short post entitled We Can Read Without Learning at All.

We require the friction of other minds to buff away self-generated roughness. Few of us can polish ourselves. We are likelier to grow cranky and conspiracy-minded, mistaking brainstorms for insight while rediscovering what the rest of the world already knows. Had I read only the books assigned in class, I would today be only nominally literate. Had I read only the books that confirmed the thoughts I already possessed, I would remain marginally illiterate.

In our networked world, we’re more likely to have a plethora of low-quality input than to be isolated. There’s more danger of becoming a bland opinion poll than becoming a cranky idiot.

Living inside a partisan bubble may be the worst of both worlds: the blandness of herd mentality and the crankiness of isolation.

In defense of complicated measurement systems

Archaic measurement systems were much more complicated than the systems we use today. Even modern imperial units, while more complicated than metric (SI) units, are simple compared to say medieval English units. Why didn’t someone think of something like the metric system a thousand years ago? I imagine they did. I don’t think they were missing a key idea; I think they had different criteria.

Virtuvian man: human scale measurement

Convenient units

We look back and wonder why people would have, for example, unrelated units of length for measuring the lengths of different things. (Of course these units could be related, but they’re not simply related, not like multiplying by powers of 10.) You might ask a medieval peasant, for example, why cloth is measured in different units than the distance between villages. The peasant might look at you like you’re stupid. “Why would you want the same units for such different things? Are you wanting to buy enough cloth to line the road to the next village?!”

When you buy cloth, you naturally measure it with your arms, and so it makes sense that cloth would be sold in units related to arms lengths [1]. If you’re measuring a long distance, you naturally measure it in units related to walking, or maybe in terms of the distance to the horizon. It was more important that each measurement be convenient for its purpose than for the units to be simply related to each other.

An acre was the amount of land a man could plow in one day. It was almost a unit of work rather than a unit of area. It was a very practical way to measure the size of a farm, and if it had a complicated relationship to, say, the unit of measurement for buying cloth, so be it.

Because the acre was based on the effort required to plow land, an acre was bigger in some regions than others [2]. We immediately think this is a horrible situation. But what did a medieval Scottish farmer care that an acre was a smaller unit in Scotland than in Ireland? Eventually this did get to be a problem, and the acre was standardized. Trade on a wider scale made standardization more important.

Why all these multiples of 2 or 3 between units rather than powers of 10? When you’re doing math in your head, or on paper with Roman numerals, small integer factors are nice to have.

History and compromise

Some traditional units must have seemed unnecessarily complicated even to contemporaries. These may have been the result of history. Two cultures come into contact and have to reconcile their units of measure. Or something about the world has change, such as wider trade mentioned above, making it necessary to compromise between what is familiar and what would be better going forward.

The metric system came out of the French Revolution. The revolutionaries weren’t concerned with history; they were prepared to blow up the world and start over. That didn’t work out well in most areas, but it did work out well for units of measure. The metric system (technically its successor SI) is now used around the world. [3]

Notes

[1] See the ell.

[2] Episode 115 of The History of English Podcast has a good explanation of this and many other related topics.

[3] It’s commonly said that the US does not use SI. It would be more accurate to say that the US does not exclusively use SI. The uses of imperial units are obvious, such as highway speeds posted in miles per hour, but SI is used quite a bit behind the scenes.

If you’d like to get daily tweets about units of measurement, follow @UnitFact on Twitter.

UnitFact

My most popular posts on Hacker News

Here are the most popular posts on my site according to the number of points given on Hacker News.

Variable-speed learning

When I was in college, one of the professors seemed to lecture at a sort of quadratic pace, maybe even an exponential pace.

He would proceed very slowly at the beginning of the semester, so slowly that you didn’t see how he could possibly cover the course material by the end. But his pace would gradually increase to the point that he was going very quickly at the end. And yet the pace increased so smoothly that you were hardly aware of it. By understanding the first material thoroughly, you were able to go through the latter material quickly.

If you’ve got 15 weeks to cover 15 chapters, don’t assume the optimal pace is to cover one chapter every week.

I often read technical books the way the professor mentioned above lectured. The density of completely new ideas typically decreases as a book progresses. If your reading pace is proportional to the density of new ideas, you’ll start slow and speed up.

The preface may be the most important part of the book. Some books I’ve only read the preface and felt like I got a lot out of the book.

The last couple chapters of technical books can often be ignored. It’s common for authors to squeeze in something about their research at the end of a book, even it its out of character with the rest of the book.

Books you’d like to have read

I asked on Twitter today for books that people would like to have read, but don’t want to put in the time and effort to read.

Here are the responses I got, organized by category.

Literature:

Math, Science, and Software:

History and economics

Religion and philosophy

Misc:

US flag if California splits into three states

There’s a proposal for California to split into three states. If that happens, what would happen to the US flag?

The US flag has had 13 stripes from the beginning, representing the first 13 states. The number of stars has increased over time as the number of states has increased. Currently there are 50 stars, arranged in alternating rows of 6 and 5 stars.

If California breaks into three states, how would we arrange 52 stars? One way would be alternating rows of 7 and 6. Here’s a quick mock up of a possible 52-star flag I just made:

52 star flag

The difference isn’t that noticeable.

What if California [1] were to secede from the US? We’ve had 49 states before, in the brief period between the beginning of Alaskan statehood and the beginning of Hawaiian statehood. During that time [1], the flag had 7 rows of 7 stars, but the rows were staggered as in the current flag, not lined up in columns as in the 48-star flag before it.

***

[1] I don’t know how many people seriously propose any state leaving the US. The last time a state tried the result was the bloodiest war in US history. There’s a group in Vermont that wants their state to withdraw from the US.

Texans talk about seceding from the union, and you’ll occasionally see SECEDE bumper stickers, but it’s a joke. Maybe a few people take it seriously, but certainly not many.

[2] There is a tradition dating back to 1818 that the flag only changes on July 4. So the period of the 49-star flag didn’t exactly coincide with the period of 49 states.

Perl as a better grep

I like Perl’s pattern matching features more than Perl as a programming language. I’d like to take advantage of the former without having to go any deeper than necessary into the latter.

The book Minimal Perl is useful in this regard. It has chapters on Perl as a better grep, a better awk, a better sed, and a better find. While Perl is not easy to learn, it might be easier to learn a minimal subset of Perl than to learn each of the separate utilities it could potentially replace. I wrote about this a few years ago and have been thinking about it again recently.

Here I want to zoom in on Perl as a better grep. What’s the minimum Perl you need to know in order to use Perl to search files the way grep would?

By using Perl as your grep, you get to use Perl’s more extensive pattern matching features. Also, you get to use one regex syntax rather than wondering about the specifics of numerous regex dialects supported across various programs.

Let RE stand for a generic regular expression. To search a file foo.txt for lines containing the pattern RE, you could type

    perl -wln -e "/RE/ and print;" foo.txt

The Perl one-liner above requires more typing than using grep would, but you could wrap this code in a shell script if you’d like.

If you’d like to print lines that don’t match a regex, change the and to or:

    perl -wln -e "/RE/ or print;" foo.txt

By learning just a little Perl you can customize your search results. For example, if you’d like to just print the part of the line that matched the regex, not the entire line, you could modify the code above to

    perl -wln -e "/RE/ and print $&;" foo.txt

because $& is a special variable that holds the result of the latest match.

***

For daily tips on regular expressions, follow @RegexTip on Twitter.

Regex tip icon

Line art

A new video from 3Blue1Brown is about visualizing derivatives as stretching and shrinking factors. Along the way they consider the function f(x) = 1 + 1/x.

Iterations of f converge on the golden ratio, no matter where you start (with one exception). The video creates a graph where they connect values of x on one line to values of f(x) on another line. Curiously, there’s an oval that emerges where no lines cross.

Here’s a little Python I wrote to play with this:

    import matplotlib.pyplot as plt
    from numpy import linspace

    N = 70
    x = linspace(-3, 3, N)
    y = 1 + 1/x

    for i in range(N):
        plt.plot([x[i], y[i]], [0, 1])
    plt.xlim(-5, 5)

And here’s the output:

In the plot above I just used matplotlib’s default color sequence for each line. In the plot below, I used fewer lines (N = 50) and specified the color for each line. Also, I made a couple changes to the plot command, specifying the color for each line and putting the x line above the y line.

        plt.plot([x[i], y[i]], [0, 1], c="#243f6a")

If you play around with the Python code, you probably want to keep N even. This prevents the x array from containing zero.

Update: Here’s a variation that extends the lines connecting (x, 0) and (y, 1). And I make a few other changes while I’m at it.

    N = 200
    x = linspace(-10, 10, N)
    y = 1 + 1/x
    z = 2*y-x

    for i in range(N):
        plt.plot([x[i], z[i]], [0, 2], c="#243f6a")
    plt.xlim(-10, 10)

Off by one character

There was a discussion on Twitter today about a mistake calculus students make:

\frac{d}{dx}e^x = x e^{x-1}

I pointed out that it’s only off by one character:

\frac{d}{de}e^x = x e^{x-1}

The first equation is simply wrong. The second is correct, but a gross violation of convention, using x as a constant and e as a variable.