A lot of smart people have a rule not to memorize anything that they can derive on the spot. That’s a good rule, up to a point. But past that point it becomes a liability.
Most students err on the side of memorizing too much. For example, it’s common for students to memorize three versions of Ohms law:
- V = IR
- I = V/R
- R = V/I.
Not only is this wasteful, tripling the number of facts to remember, it’s also error prone. When you memorize things without understanding them, you have no way to detect mistakes. Someone memorizing the list above might think “Is I = V/R or R/V?” but someone who knows what the terms mean will know that more resistance means less current, so the latter cannot be right.
I got through my first probability class in college without memorizing anything. I worked every problem from first principles, and that was OK. But later on I realized that even though I could derive things from scratch every time I needed them, doing so was slowing me down and keeping me from seeing larger patterns.
The probability example stands out in my mind because it was a conscious decision. I must have implicitly decided to remember things in other classes, but it wasn’t such a deliberate choice.
I’d say “Don’t memorize, derive” is a good rule of thumb. But once you start to say to yourself “Here we go again. I know I can derive this. I’ve done it a dozen times.” then maybe it’s time to memorize the result. To put it another way, don’t memorize to avoid understanding. Memorize after thoroughly understanding.
Related post: Just-in-case versus just-in-time
When I was studying for the computing olympiad, I wanted to implement every datastructure from scratch whenever I needed it because I wanted to be able to solve problems “from scratch.” Most people would maintain a repo of datastructures they can copy from instead. At a certain point, writing the datastructures from scratch seriously bottlenecked my progress, since I would gain little from reimplementing them but they limited the rate I could solve problems. At a certain point it became important to bypass the details to reason at a higher level of abstraction.
This is definitely true for going between matrix, vector-wise and element-wise identities in linear algebra!
I’ve always found that if I use an algorithm/formula/constant repeatedly I will remember it anyway and no longer have to derive or lookup. The process is somewhat automatic (and garbage collected for things I no longer use).
I’ve had similar experience. There are things I rarely use, and therefore have no desire to remember, and things I often use, and therefore remember without trying. But the gray zone in between is interesting: things I use often enough that I want to remember them, but not so often that I remember them without deliberately intending to.