Sampling with replacement until you’ve seen everything

Suppose you have a standard deck of 52 cards. You pull out a card, put it back in the deck, shuffle, and pull out another card. How long would you expect to do this until you’ve seen every card?

Here’s a variation on the same problem. Suppose you’re a park ranger keeping data on tagged wolves in a national park. Each day you catch a wolf at random, collect some data, and release it. If there are 100 wolves in the park, how long until you’ve seen every wolf at least once?

We’ll solve this problem via simulation, then exactly.

Simulation

The Python code to solve the problem via simulation is straight forward. Here we’ll simulate our ranger catching and releasing one of 100 wolves. We’ll repeat our simulation five times to get an idea how much the results vary.

    import numpy as np
    from numpy.random import default_rng

    rng = default_rng()
    num_runs = 5
    N = 100 # number of unique items

    for run in range(num_runs):
        tally = np.zeros(N,dtype=int)
        trials = 0
        while min(tally) == 0:
            tally[rng.integers(N)] += 1
            trials += 1
        print(trials)

When I ran this, I got 846, 842, 676, 398, and 420.

Exact solution

Suppose we have a desk of N cards, and we sample the deck with replacement M times. We can select the first card N ways. Ditto for the second, third, and all the rest of the cards. so there are NM possible sequences of card draws.

How many of these sequences include each card at least once? To answer the question, we look at Richard Stanley’s twelvefold way. We’re looking for the number of ways to allocate M balls into N urns such that each urn contains at least one ball. The answer is

N! S(M, N)

where S(M, N) is the Stirling number of the second kind with parameters M and N [1].

So the probability of having seen each card at least once after selecting M cards randomly with replacement is

N! S(M, N) / NM.

Computing Stirling numbers

We’ve reduced our problem to the problem of computing Stirling numbers (of the second kind), so how do we do that?

We can compute Stirling numbers in terms of binomial coefficients as follows.

S(n,k) = \frac{1}{k!}\sum_{i = 0}^k (-1)^i {k \choose i} (k-i)^n

Now we have a practical problem if we want to use this formula for larger values of n and k. If we work with floating point numbers, we’re likely to run into overflow. This is the perennial with probability calculations. You may need to compute a moderate-sized number as the ratio of two enormous numbers. Even though the final result is between 0 and 1, the intermediate results may be too large to store in a floating point number. And even if we don’t overflow, we may lose precision because we’re working with an alternating sum.

The usual way to deal with overflow is to work on a log scale. But that won’t work for Stirling numbers because we’re adding terms together, not multiplying them.

One way to solve our problem—not the most efficient way but a way that will work—is to work with integers. This is easy in Python because integers by default can be arbitrarily large.

There are two ways to compute binomial coefficients in Python: scipy.special.binom and math.comb. The former uses floating point operations and the latter uses integers, so we need to use the latter.

We can compute the numerator of our probability with

    from math import comb
    def k_factorial_stirling(n, k):
        return sum((-1)**i * comb(k, i)*(k-i)**n for i in range(k+1))    

Then our probability is

    k_factorial_stirling(M, N) / N**M

If we compute the probability that our ranger has seen all 100 wolves after catching and releasing 500 times, we get 0.5116. So the median number of catch and release cycles is somewhere around 500, which is consistent with our simulation above.

Note that the numerator and denominator in our calculation are both on the order of 101000 while the largest floating point number is on the order of 10308, so we would have overflowed if we’d used binom instead of comb.

Related posts

[1] It’s unfortunate that these were called the “second kind” because they’re the kind that come up most often in application.

Image: “Fastening a Collar” by USFWS Mountain Prairie is licensed under CC BY 2.0 .

Regular expressions and successive approximation

Regular expressions can do a lot of tasks in practice that they cannot do in theory. That’s because a particular application of regular expressions comes with context and with error tolerance.

For example, much has been said about how regular expressions cannot parse HTML. This is strictly true, but it says nothing about how well your particular regular expression might work for your task.

Maybe you’re searching HTML generated by a particular person or program, not all potential HTML documents, and a regular expression can find what you’re looking for in your idiomatic subset of HTML.

Types of errors

If you’re searching for a pattern, you can err by finding something that isn’t a match or by failing to find something that is a match, false positives and false negatives.

False positives are often not a problem. A very common use for regular expressions is to filter a huge amount of text down to a small amount of text to visually inspect. Maybe your first attempt at using a regular expression to find a needle in a haystack returns a smaller haystack. Then you refine your regular expression until it returns a small number of sharp pointy things in addition to the needle you’re after.

False negatives are more of a problem. If you can’t find what you’re looking for, you generalize your regular expression until you then have a false positive problem. If you still can’t find what you’re looking for, then you have to wonder whether the thing you’re after exists. Proving a negative is hard in general.

But even false negatives may not be a problem. Maybe there are a dozen needles in your haystack, but you don’t need to find all of them; you just need to find one of them.

In my work in data privacy I’ve needed to test whether personal information is somewhere it’s not supposed to be. If my regular expression finds any personal information, then I can tell my client they have a problem. If I don’t find any, I have more work to do.

Example: call signs

The FCC has fairly complicated rules for assigning amateur radio call signs. For starters,

Each call sign has a one letter prefix (K, N, W) or a two letter prefix (AA-AL, KA-KZ, NA-NZ, WA-WZ) and a one, two, or three letter suffix separated by a numeral (0-9) indicating the geographic region.

This is enough information to craft a regular expression has no false negatives, but may have some false positives. For example, you might start with

    [A-Z]+[0-9][A-Z]+

which matches a string of capital letters, followed by a digit, followed by another string of capital letters. Maybe that’s good enough. But you could get more specific, such as

    [AKNW][A-Z]?[0-9][A-Z]{1,3}

which comes closer to duplicating the quotation above from the FCC rules: one of A, K, N, or W, optionally followed by another letter, followed by a digit, followed by between one and three letters. But you could do better:

    \b(A[A-L]|[KNW][A-Z])[0-9][A-Z]{1,3}\b

This adds the restriction that our pattern appear on word boundaries, and captures the restriction on letters that can follow ‘A’.

You could keep going, crafting ever more complicated regular expressions that reduce your false positive rate. The FCC has a lot more restrictions than the ones quoted above, and you could incorporate some of these into your expression.

But false positives are inevitable. The ultimate determinant of what is a valid call sign is the FCC database of call signs, which changes continually. You refine your expression until you hit diminishing return. If you need more precision than that, don’t use regular expressions.

Prototype vs Perfect

I get pushback every time I write about regular expressions. I try to make it clear that I’m talking about quick and useful approximations, but perhaps I should do more to emphasize that. I suppose critics of regular expressions are reacting to enthusiasts who want to use regular expressions for everything.

It’s often possible to create a 99% solution in 1/1000 of the time it would take to create a 100% solution. Whether or not 99% is good enough depends on context.

More regex posts

Non-associative multiplication

There are five ways to parenthesize a product of four things:

  • ((ab)c)d
  • (ab)(cd)
  • (a(b(cd))
  • (a(bc))d
  • (a((bc)d)

In a context where multiplication is not associative, the five products above are not necessarily the same. Maybe all five are different.

This post will give two examples where the products above are all different: octonions and matrix multiplication.

If you’re thinking “But wait: matrix multiplication is associative!” then read further and see in what sense I’m saying it’s not.

Octonioins

Octonion multiplication is not associative. I wrote a blog post a while back asking how close the octonions come to being associative. That is, if we randomly generate unit-length octonions a, b, and c, we can calculate the norm of

(ab)ca(bc)

and ask about its expected value. Sometimes for a triple of octionions this value is zero, but on average this expression has norm greater than 1. I estimated the average value via simulation, and later Greg Egan worked out the exact value. His write-up had gone down with the Google+ ship, but recently Greg posted a new version of his notes.

In this post I gave Python code for multiplying octonions using a function called CayleyDickson named after the originators of the algorithm. Let’s rename it m to have something shorter to work with and compute all five ways of associating a product of four octonions.

    import numpy as np

    def m(x, y):
        return CayleyDickson(x, y)

    def take_five(a, b, c, d):
        return [
            m(a, (m(b, m(c, d)))),
            m(m(a, b), m(c, d)),
            m(m(m(a, b), c), d),
            m(a, m(m(b, c), d)),
            m(m(a, m(b, c)), d)
    ]

I first tried products of basis elements, and I only got two different products out of the five ways of associating multiplication, but I only tried a few examples. However, when I tried using four random octonions, I got five different products.

    np.random.seed(20220201)

    a = np.random.rand(8)
    b = np.random.rand(8)
    c = np.random.rand(8)
    d = np.random.rand(8)

    for t in take_five(a, b, c, d):
        print(t)

This gave a very interesting result: I got five different results, but only two different real (first) components. The five vectors all differed in the last seven components, but only produced two distinct first components.

    [ 2.5180856  -2.61618184  ...]
    [ 2.5180856   0.32031027  ...]
    [ 2.5180856   1.13177500  ...]
    [ 3.0280984  -0.30169446  ...]
    [ 3.0280984  -2.36523580  ...]

I repeated this experiment a few times, and the first three results always had the same real component, and the last two results had another real component.

I suspect there’s a theorem that says

Re( ((ab)c)d ) = Re( (ab)(cd) ) = Re( a(b(cd)) )

and

Re( (a(bc))d ) = Re( a((bc)d) )

but I haven’t tried to prove it. If you come with a proof, or a counterexample, please post a link in the comments.

Matrix multiplication

Matrix multiplication is indeed associative, but the efficiency of matrix multiplication is not. That is, any two ways of parenthesizing a matrix product will give the same final matrix, but the cost of the various products are not the same. I first wrote about this here.

This is a very important result in practice. Changing the parentheses in a matrix product can make the difference between a computation being practical or impractical. This comes up, for example, in automatic differentiation and in backpropagation for neural networks.

Suppose A is an m by n matrix and B is an n by p matrix. Then AB is an m by p matrix, and forming the product AB requires mnp scalar multiplications. If C is a by q matrix, then (AB)C takes

mnp + mpq = mp(n + q)

scalar multiplications, but computing A(BC) takes

npqmnq = nq(m + p)

scalar multiplications, and in general these are not equal.

Let’s rewrite our multiplication function m and our take_five function to compute the cost of multiplying four matrices of random size.

We’ve got an interesting programming problem in that our multiplication function needs to do two different things. First of all, we need to know the size of the resulting matrix. But we also want to keep track of the number of scalar multiplications the product would require. We have a sort of main channel and a side channel. Having our multiplication function return both the dimension and the cost would make composition awkward.

This is is kind of a fork in the road. There are two ways to solving this problem, one high-status and one low-status. The high-status approach would be to use a monad. The low-status approach would be to use a global variable. I’m going to take the low road and use a global variable. What’s one little global variable among friends?

    mults = 0

    def M(x, y):
        global mults
        mults += x[0]*x[1]*y[0]
        return (x[0], y[1])

    def take_five2(a, b, c, d):

        global mults
        costs = []

        mults = 0; M(a, (M(b, M(c, d)))); costs.append(mults)
        mults = 0; M(M(a, b), M(c, d));   costs.append(mults)
        mults = 0; M(M(M(a, b), c), d);   costs.append(mults)
        mults = 0; M(a, M(M(b, c), d));   costs.append(mults)
        mults = 0; M(M(a, M(b, c)), d);   costs.append(mults)

        return costs 

Next, I’ll generate five random integers, and group them in pairs as sizes of matrices conformable for multiplication.

    dims = np.random.random_integers(10, size=5)
    dim_pairs = zip(dims[:4], dims[1:])
    c = take_five2(*[p for p in dim_pairs])

When I ran this dims was set to

[3 9 7 10 6]

and so my matrices were of size 3×9, 9×7, 7×10, and 10×6.

The number of scalar multiplications required by each way of multiplying the four matrices, computed by take_five2 was

[1384, 1090, 690, 1584, 984]

So each took a different number of operations. The slowest approach would take more than twice as long as the fastest approach. In applications, matrices can be very long and skinny, or very wide and thin, in which case one approach may take orders of magnitude more operations than another.

Related posts

Non-equivalent floating point numbers

Here’s an interesting paragraph from Handbook of Floating Point Arithmetic by Jean-Michel Muller et al.

Many common transformations of a code into some naively equivalent code become impossible if one takes into account special cases such as NaNs, signed zeros, and rounding modes. For instance, the expression x + 0 is not equivalent to the expression x if x is -0 and the rounding is to nearest.

Wait, what?!

First let’s see what this paragraph is not saying. What does the following code print?

    #include <stdio.h>

    int main()
    {
        double t = 1e-200;
        double x = -1e-200 * t; // x == -0
        double y = x + 0;

        printf( (x == y) ? "0 == 0\n" : "0 != 0\n");
    }

I compiled the code above with gcc using default options, which means rounding to nearest.

It prints 0 == 0 as you would have guessed if you weren’t spooked by the quotation above.

Let’s go back and read that paragraph more carefully. It speaks of equivalent code. In the code above, the variables x and y are equal in the sense that x == y returns true, but that does not mean that x and y are equivalent. The two variables are not equal bit-for-bit and so some code could behave differently when given one rather than the other.

Let’s define

    void hexdump(double x)
    {
        unsigned long* pu;
        pu = (unsigned long*)&x;
        printf("%lx\n", *pu);
    }

and add the lines

    hexdump(x);
    hexdump(y);    

to main.

This prints the following.

    8000000000000000
    0

We see that x has its top bit set but y does not. More on why here.

More floating point posts

Permutations at the command line

Yesterday I wrote about how you could use the tr utility to find the dual of a modal logic expression. Today I’ll show how you could use tr to work with permutations.

This post is a hack, and that’s the fun of it.

In the logic post we were using tr to swap characters. More generally, tr can permute characters. As mentioned before, it’s key that replacements are conceptually simultaneous. Otherwise none of this would work.

Problem 1

Here’s a homework problem from Abstract Algebra by Dummit and Foote.

Let σ be the permutation

1 → 3, 2 → 4, 3 → 5, 4 → 2, 5 → 1

and let τ be the permutation

1 → 5, 2 → 3, 3 →2, 4 → 4, 5 → 1.

Find the cycle decompositions of each of the following permutations: σ, τ, σ², στ, τσ, and τ²σ.

We won’t do all the parts of this problem, because that would be tedious. But we’ll do σ and στ.

A decomposition of a permutation factors the permutation into smaller permutations if possible. The algorithm to decompose a permutation starts by picking any element and finding how it cycles. Then it picks another element not in that cycle and finds its cycle. This repeats until we run out of elements. The original permutation is then equal to the product of these cyclic permutations.

Permutation σ

We might as well start with 1. Let’s see whIn the problem above, σ sends the digits 12345 to the digits 34521.

    echo 1 | tr 12345 34521

This prints 3. No surprise: 1 goes to 3.

Now where does 3 go?

    echo 1 | tr 12345 34521 | tr 12345 34521

This says 5. And where does 5 go?

    echo 1 | tr 12345 34521 | tr 12345 34521 | tr 12345 34521

It goes back to 1. So our first cycle is

(1 3 5)

Now let’s pick a number not in this cycle, say 2.

    echo 2 | tr 12345 34521

So 2 goes to 4, and there’s no place left for 4 to go except back to 2. This means σ has the decomposition

(1 3 5)(2 4)

Permutation στ

Now let’s do στ. This means do τ then do σ. At least that’s how I’m reading it, like function composition; some folks use the opposite convention.

Composing permutations corresponds to piping one tr command into another. So we could see where 1 goes under στ as follows.

    echo 1 | tr 1235 5321 | tr 12345 34521

This returns 1, so 1 is a fixed point. So our decomposition starts with

(1).

OK, so what about 2? We could find out with

    echo 2 | tr 1235 5321 | tr 12345 34521 | | tr 1235 5321 | tr 12345 34521

but our commands are getting kinda long, and could get longer still.

Another approach would be to compute στ by seeing what it does to all numbers, not must one at a time.

    echo 12345 | tr 1235 5321 | tr 12345 34521

returns 15423. So we could see where 2 goes using

    echo 2 | tr 12345 15423

which tells us 2 goes to 5. We could see where 5 goes by adding another copy of our pipe to tr on the end, or by changing echo 2 to echo 5. In any case 5 goes to 3, and 3 goes to 4.

So στ has the decomposition

(1)(2 5 3 4).

Problem 2

The next homework problem in Dummit and Foote ask for the decomposition of a permutation of size 15:

1 → 13, 2 → 2, 3 → 15, …

Our trick for using tr won’t work without some modification because

    tr 123… 13215…

would send 1 to 1, 2 to 3, 3 to 2, etc. However, we could do the problem by representing our numbers in hexadecimal.

    tr 123… D2F…

In fact, the full permutation sends the numbers 1 through F to D2FEA6C341795B8.

Let’s define a shell alias to make things easier:

    alias t='tr 123456789ABCDEF D2FEA6C341795B8'

Now we can find the cycle of 1 with the following commands.

    echo 1 | t
    echo 1 | t | t
    echo 1 | t | t | t
    echo 1 | t | t | t | t

which tells us 1 is part of the cycle

(1 D 5 A)

Similarly we can find that 2 is a fixed point and 3 belongs to the cycle

(3 F 8)

and the final decomposition is

(1 D 5 A)(2)(3 F 8)(4 E B 7 C 9 4)

Related posts

Word problems, logic, and regular expressions

Word problems

Suppose you have a sequence of symbols and a set of rewriting rules for replacing some patterns of symbols with others. Now you’re given two such sequences. Can you tell whether there’s a way to turn one of them into the other?

This is known as the word problem, and in general it’s undecidable. In general the problem cannot be solved by a program, but some instances can. We’ll look at a word problem that can be solved with a few regular expressions.

Modal logic

Basic modal logic has two symbols, □ (“box”) and ◇ (“diamond”), and concatenations of these symbols. In general, there are infinitely many non-equivalent sequences of boxes and diamonds, depending on the axioms of your modal logic.

In the axiom system S4, every non-empty sequence of boxes and diamonds is equivalent to one of six possibilities:

  • □◇
  • ◇□
  • □◇□
  • ◇□◇

An arbitrary sequence of boxes and diamonds can be reduced to one of the forms above by applying the following rules:

  • □ □ → □
  • ◇ ◇ → ◇
  • □◇□◇ → □◇
  • ◇□◇□ → ◇□

Regular expressions

We can apply the reduction rules above using regular expressions with the following Perl code.

    use utf8;

    $_ = "□□◇□◇◇◇◇□□";

    s/□+/□/g;
    s/◇+/◇/g; 
    s/(□◇)+/□◇/g; 
    s/(◇□)+/◇□/g;

    print;

The directive use utf8; tells Perl to be prepared for non-ASCII characters, namely boxes and diamonds. In Perl, $_ is the implicit variable; all the following substitution commands will modify this variable, and the print statement will output the final value of this variable.

The first substitution replaces one or more consecutive boxes with one box and the second does the analogous substitution for consecutive diamonds. The third and fourth substitution commands replace repetitions of □◇ or ◇□ with a single instance.

The script above outputs

□◇□

meaning that

□□◇□◇◇◇◇□□p ⟷ □◇□p

is a theorem in S4.

Word problems can’t always be solved using regular expressions, or any other programming technique, but this one could.

Related posts

Better parts, worse system

There’s a rule of systems thinking that improving part of a system can often make the system as a whole worse.

One example of this is Braess’ Paradox that adding roads can make traffic worse, and closing roads can improve traffic.

Another example is the paradox of enrichment: Increasing the food available to an ecosystem may lead to instability, and even to extinction.

Richard Hamming put it this way in what he called the first rule of systems engineering:

If you optimize the components, you’ll probably ruin the system performance.

For more variations on this principle, see this Twitter thread.

I’ve been thinking about this lately in regard to software tools. If you’re just starting out and ask around for the best way to do this and that, you might get individual bits of good advice that add up to bad advice.

“You’re still using X? You should be using Y. Y’s better.”

When someone says Y is “better” they probably mean that it has more features. It may also be objectively better by several other criteria. But that doesn’t mean it’s better for you, at this point in time, given your experience, your temperament, and your project.

The Dreyfus model of skill acquisition says that beginners want context-free rules: Y is better than X. That’s an understandable desire, and people are all too willing to give context-free advice. But context matters. A lot.

Yesterday I needed to do a little search-and-replace text munging, join two data sets, and compute some basic statistics. I thought about how I could go off on tangents for each task, using the best tool for each task individually, and take forever to get my job done. Instead, I cobbled something together quickly at a command line, with each step being far from the most general solution.

If you choose the best (i.e. biggest) tool for each task, you’ll get affirmation, or at least avoid criticism, for each choice. And you’ll likely end up with an unwieldy hodgepodge of tools, none of which you’re confident in using. If instead you think carefully about your abilities and your needs, you’ll gradually assemble a collection of tools that work together for you,  making you more confident and productive.

Random sampling to save money

I was stunned when my client said that a database query that I asked them to run would cost the company $100,000 per year. I had framed my question in the most natural way, not thinking that at the company’s scale it would be worth spending some time thinking about the query.

Things have somewhat come full circle. When computers were new, computer time was expensive and programmer time was dirt cheap by comparison. Someone might be scolded for using computer time to do what a programmer could do. Now of course programmer time is expensive and computer time is cheap. Usually.

But when dealing with gargantuan data sets, the cost of computer time might matter a great deal, especially when renting services from a cloud provider. Maybe you can find a more clever way to run an enormous query. Or even better, maybe you can find a way to avoid running an enormous query.

One way to save time and money is to base decisions on random samples rather than exhaustive queries. This requires a little preparation. How big a sample do you need? How exactly are you going to take a sample? What kind of uncertainty do you have in your result when you’re done? You can afford to think about these questions a long time if it saves tens of thousands of dollars per year.

Computational asceticism

A while back I wrote about computational survivalism, being prepared to work productively in a restricted environment. The first time I ran into computational survivalism was when someone said to me “I prefer Emacs, but I use vi because I only want to use tools I can count on being installed everywhere.” I thought that was odd at the time, but I’m more sympathetic now that I’m occasionally thrown into a client’s environment where I can’t install software.

There are several variations on this philosophy. The version quoted above is somewhat extreme, wanting to only use what’s installed everywhere. A more moderate version would be to say you want to be fluent in with lowest-common-denominator tools in addition to tools you prefer to use when available. This is similar to the idea that web applications should degrade gracefully for less capable browsers.

Computational asceticism, the idea that there’s something not just pragmatic but actually virtuous about using minimal tools. I am not sympathetic to this view but I can see how it develops. It’s sort of a variation on making a virtue out of necessity (or at least an imagined necessity), or a kind of technological Stockholm syndrome.

There’s a variation on computational askesis that behaviorally looks the same but has a different motivation. Someone could say there is nothing inherently virtuous about using minimal tools, but doing so is simpler than moving back and forth between two sets of tools. Using minimal tools all the time guarantees you’re fluent in the tools when the time comes.

I suppose it matters what sort of scenario you’re preparing for. If you anticipate being in a Jurassic Park scenario where you can save the world by being able to come up to speed quickly on an unfamiliar Unix system, then using only minimal tools all the time is good preparation.

"Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee."

Most of us will never have to be productive on a time scale of seconds. But more will want to make a good impression, or at least avoid embarrassment, if they have to use a command line live in front of coworkers or clients.

One thing to say in favor of computational askesis is that it’s stable [1]. Trying to use minimal tools occasionally is sort of an unstable equilibrium: it requires effort to maintain.

This is all a sort of gray zone problem. There are things you use so often that you maintain fluency without trying, and things that you use so seldom that you’re content to look up when needed. What do you do in the gray zone of things you use just often enough that you’d like to be more fluent in using them but not so often that you remember them without trying? It all depends on your “threat model,” what you’re preparing for.

I’ve written a couple blog posts that explore this idea further:

Update

Some more thoughts on gray zone problems.

One strategy is to practice nothing and memorize nothing. If you don’t use something often enough to remember it without trying, it’s not that important.

The opposite strategy is to try to remember everything and spend your life in endless preparation.

The former is a viable option. The latter is not because there’s no end to what might potentially be useful.

The optimal approach might be 10% of the way between the two extremes. Going much further down that continuum is exhausting.

The strategy to practice nothing and memorize nothing may be fine if you’re in a stimulating environment. If you’re in a dull environment and want to prepare for something else, you need a more active strategy.

You have to be very selective in the list of skills you want to improve or even maintain. Maybe it’s best to be less selective at the beginning of your career and more selective as you mature and gain responsibilities.

 

[1] Though there’s the problem of defining exactly what “minimal” means. There’s nothing that is absolutely guaranteed to be on every computer, so you have to set some threshold, say, using tools that are 99% sure to be there.

Deleting reproducible files in Emacs dired

Imagine you could list the contents of a directory from a command line, and then edit the text output to make things happen. That’s sorta how Emacs dired works. It’s kind of a cross between a bash shell and the Windows File Explorer. Why would you ever want to use such a bizarre hybrid?

One reason is to avoid context switching. If you’re editing a file, you can pop over to a new buffer that is your file manager, do what you need to do, then pop back, all without ever leaving Emacs.

Another reason is that, as with everything else in Emacs, it’s all text. Everything in Emacs is just text, and so the same editing commands can be used everywhere. (More on that here.)

Even though I use Emacs daily, and even though I can make a case for why dired is great, I don’t use it that much. Or rather, I don’t use that much of what it can do. The good that I would, I do not.

I was reviewing dired‘s features and discovered something very handy: typing %& will mark files for deletion that can easily be created again. In particular, it will flag the byproducts of compiling a LaTeX file.

For example, the following is a screenshot of a dired buffer.

When I type %& it highlights the LaTeX temp files in red and marks them for deletion. (The D’s in the left column indicate files to be deleted.)

This doesn’t delete the files, but it marks them for deletion. If I then type x the files will be deleted.

In addition to the unneeded LaTeX files, it also highlighted a .bak file. However, it did not highlight the .o object file. I suppose the thought was that most people would manage C programs from a make file. I’m sure the class of files to mark is configurable, like everything else in Emacs.

More Emacs posts