How to get started with functional programming

Someone asked me this weekend how to get started with functional programming. My answer: Start by writing pure functions in the programming language you’re currently using.

The only input to a pure function is its argument list and the only output is its return value. If you haven’t seen this before, you might think all functions are pure. After all, any function takes in values and returns a value. But in conventional programming there are typically out-of-band ways for information to flow in or out of a function. For example, an impure function may depend on a global variable or class member data. In that case, it’s behavior is not entirely determined by its arguments. Similarly, an impure function might set a global variable or write to a database. In that case the function has a side effect in addition to its return value.

You can write pure functions in any language, though it’s easier in some languages than others. For example, no one would call Fortran a functional language, but there are people (M. J. D. Powell comes to mind) who discipline themselves to write pure functions in Fortran.

Why write pure functions? A pure function has referential transparency, meaning it will always return the same value when given the same inputs. The output does not depend on the system time, the state of a database, which functions were called previously, or anything else that is not explicitly passed as an argument to the function. This means pure functions are easier to understand (and hence easier to debug and test).

You can’t always write pure functions. If you need to stick a value in a database, this cannot be accomplished with a pure function. Or if you’re calling a random number generator, you don’t want it to have referential transparency, always returning the same output! But the goal is to use pure functions when practical. You want to eliminate out-of-band communication when it’s convenient to do so. Total purity is not practical; some argue that the sweet spot is about 85% purity.

So why don’t programmers use pure functions more often? One reason is that pure functions require longer argument lists. In an object oriented language, object methods can have shorter argument lists by implicitly depending on object state. The price to pay for shorter method signatures is that you can’t understand a method by itself. You have to know the state of the object when the method is called. Is it worthwhile to give up referential transparency in order to have shorter method signatures? It depends on your context and your taste, though in my opinion its often worthwhile to use longer function signatures in exchange for more pure functions.

Another reason people give for not writing pure functions is that its too expensive to copy large data structures to pass them into a function. But that’s what pointers are for. You can conceptually pass an object into a function without having to actually make a copy of the object’s bits.

You can also fake purity for the sake of efficiency. For example, Mike Swaim left a comment recently giving an example of how memoization sped up a program by several orders of magnitude. (Memoization is a technique of caching computations. When a function is asked to compute something, it first looks to see whether it has already done the calculation. If so, it returns the cached value. If not, it does the calculation and adds its output to the cache.) A function that uses memoization is not strictly pure — its calculations have a persistent impact on the state of its cache — but such a function can still have referential transparency, always returning the same output given the same input. You could say it’s cheating to call such functions pure, and it is, but if you’re really a stickler about it, all pure functions have side effects.

Related posts

21 thoughts on “How to get started with functional programming

  1. I wrote the program for my Master’s Thesis (a dynamic programming problem) in Turbo Pascal, so I got good at functional programming. I really miss that compiler! For its day, it was extremely fast.

    The program was not, however. On the old ‘286 clone I owned back then, I would start the run at 5 PM and get an answer by 7 the next morning.

  2. I would suggest to try and grasp the idea of higher order functions. For me this is the key idea in functional programming, one which allows building software from functions as oppose to building software from objects.

  3. A good way to get used to concept of purity is to build some spreadsheet models. The built-in functions such as SUM are pure. As a result, although they are just expressions, formulas are also “pure”. And user-defined functions (such as the kind you might write in VBA for Excel) can’t have side effects on the state of the sheet itself, although they certainly don’t have to be “pure” in other respects.

    I think this purity, plus the fact that a typical spreadsheet shows the results of intermediate calculations, is why people who otherwise consider themselves “non-programmers” are often able to build and maintain quite complex spreadsheet calculations. “Purity” is a very powerful concept even when divorced from the rest of functional programming.

  4. Alexis Gallagher

    When you say referential integrity, do you maybe mean referential transparency?

  5. s/”referential integrity”/”referential transparency”

    “Referential integrity” is a property defined in the relational data model.

  6. Alexis and Senderista: Yes, I meant to say transparency. Thanks for pointing that out. I updated the post.

  7. In an object oriented language, object methods can have shorter argument lists by implicitly depending on object state. The price to pay for shorter method signatures is that you can’t understand a method by itself. You have to know the state of the object when the method is called.

    Note that when your objects are immutable this isn’t a problem. Immutability and referential transparency are good friends. :)

  8. I’d just like to point out the “pure” keyword in Fortran that makes the compiler force you to write pure functions. Also, all “elemental” functions and subroutines are “pure” in the Fortran sense. Not that it matters for most people, but now you know!

  9. You may want to extend your definition of purity to include class member functions that don’t change the state of their instance. The compiler ist just hiding the “this”-pointer they get at runtime, but if you include it in your mind, they become pure.

    A similar argument can be made for the consumption of global variables – as long as the function does not change them. You could hold all global variables in one large object and then pass it to all functions (again, this is done implicitly by the compiler).

    Fulfills the definition, but IMHO not the spirit of purity, because you usually pass in a lot of stuff that isn’t needed by the function. This effect is more limited in case of the classes this-pointer.

  10. Another reason to use functional programming (and immutable objects) is that it makes multithreaded programming much easier. If your functions don’t have side effects (and don’t modify objects) then they’re pretty much thread safe.

  11. John,

    Is referential transparency as clear cut as you make it? You could write a method that is transparent, as you describe it, while hardly being more transparent practically by simply passing the “this” pointer as the method argument. (Pass the object as the argument of it’s own method.) Thus referential transparency can be obfuscated by abstraction.

  12. Tim: A pure function does not modify its arguments, so even if it receives a “this” pointer, it shouldn’t modify the object. But if the function returned the “this” pointer, then you’re right.

    You have a good point: You can make anything pure by passing enough in and out. For example, I said that making changes to a database negates purity. But you could write a function that takes the previous state of a database as one (enormous) input and returns the new state of the database as output.

    To be practical, a pure function needs to take in small amount of data and return a small amount of data. But if a pure function does take a huge amount of input, at least you’re being explicit about what is required.

    Of course “small” is entirely subjective. Since I’m discussing psychological advantages of purity — easier to understand, test, and debug — the idea of “small” should also be psychological. Suppose a function takes in a color photo and returns the same photo in gray scale. That’s a lot of input and output in terms of bits, but psychologically it’s just one photo in and one photo out. On the other hand, a function that takes 30 unrelated floating point numbers as input takes in a lot fewer bits, but psychologically that’s too much input to grasp.

  13. Andrew Pennebaker

    That’s not a bad idea. Lisp/ML/Prolog tend to scare people, and there’s no reason they can’t do FP-style coding in their language of choice. In particular, I’d suggest learning to love mapping and recursion. For Rubyists, use each(), select(), reject(), and inject().

  14. Self-documented code is a necessity for programmers, when going back to old code, and when handing code off to someone else. Simple, straight forward functions can be written in such a way. Regardless of the complexity of a function or any other code, it should be documented using comments.

  15. I do not program functionally but your comment about passing a pointer seems to me to break the whole concept of a pure function.

  16. SteveBrooklineMA

    Brunetti makes a point I was going to make. In fact, I have at times put a bunch of “global variables” in a structure, created one instance of that structure in main(), then passed it around to many functions, even though those functions did not all need all those variables. This is a simple way to convert kludgy code into something that technically satisfies somebody’s programming standard.

  17. “So why don’t programmers use pure functions more often? One reason is that pure functions require longer argument lists. In an object oriented language, object methods can have shorter argument lists by implicitly depending on object state.”

    This argument is just wrong. Brunetti and SteveBrooklineMA hinted at it. A function that depends on global state (more precisely, declared higher in the lexical scoping hierarchy, not necessarily at the toplevel) that doesn’t change over time is pure. That state may be any kind of context, including instance variables, a *this pointer to an immutable object, plain old global variables, etc.

    What may change the number of argument is not the dependency over some state, but representing the *mutation* of this state by passing the old state and returning the new state as output. This is about mutation, and not about having an implicit context, which is perfectly acceptable of a .pure function.

  18. Spoken like a guy with more OO in his past than FP.

    > The only input to a pure function is its argument list and the only output is its return value.
    – True-ish, but with complex, ambiguous datatypes, the world is more flexible than that, said the APL-er.

    > A pure function has referential transparency, meaning it will always return the same value when given the same inputs.
    – Not exactly – the return need not be coupled to anything external referentially, but can hold anything structurally. It becomes the (additional) responsibility of the caller to examine the return for possibly differential structure. Implicit alternate signatures (sort of OO-ish but without the OO baggage) by argument structure are ancient along with correspondingly variable structures of return contents.

    > The output does not depend on the system time…
    – Whoa! I’ve written FP functions that do no more that return the system time in a pretty format. The caller simply asked for the time with a format request – the called function asked a system function and then made that return pretty before returning to its caller.

    > This means pure functions are easier to understand (and hence easier to debug and test).
    – Easier to understand AFTER you *GET* FP…same for debugging and testing.

    > If you need to stick a value in a database, this cannot be accomplished with a pure function.
    – Yes, it can. The caller’s only responsibility is to pass good data arguments. The called function’s responsibility is to advise the caller of success or failure, maybe a “why” indicator on failure, depending on the sophistication of the caller.

    > Or if you’re calling a random number generator, you don’t want it to have referential transparency…
    – Only maybe. That’s a duty best detailed to system functions not user functions. For RNG calls…system functions stand outside the arena of the user function. Admittedly, if you need some wacky RNG implementation that provides services beyond the capacity of the default system function, you may have to write a user function, but that still rarely requires more than masking the underlying system function by some transformation.

    > So why don’t programmers use pure functions more often?
    ‘Cuz managers don’t get the value, and, since the programmer off-the-street mostly knows everything except FP, it costs money to train them. FP looks expensive, since programmer time is the most costly aspect of the IT expense equation. FP is WAY cheaper in the long run, but, well, managers…what can I say?

    > One reason is that pure functions require longer argument lists.
    Again, not in FP languages that respect complex argument structures…examination of the argument can reduce the length of the argument list to one.

    I get that your experience is in languages other mine, since these are your opinions, but really, a couple o’ decades and experience with APL lead me to WAY different conclusions.

    (some other) Richard

  19. Richard: I don’t think that you can treat functions doing database writes as “pure” functions. The database connection has hidden state, which can trip you up. One of the nice things about functional programming is that it’s thread safe, since your code doesn’t have side effects. If you try doing that with a database connection, you’re asking for trouble.

  20. As with issues such as RNG’s and timestamps, database interactions can be purely pass/fail and correspondingly FP, e.g., http://self.maluke.com/txns. RDBMS interactions need not violate thread-safety, so long as the function called notifies the caller of its failure to accomplish its duty with adequate information as to the nature of its failure. Program-by-contract is not an OO-only principle in the 21st century. On failure, it’s up to the caller to resolve the message (return) related to the failure state and to repair to a consistent state with respect to the upstream frame-stacks at large. However, I’m done here, since there was no other objection, and RDBMS issues were relatively the least and most obscure of my concerns.

  21. Pure functions and immutable variables being the core of functional programming is something that I didn’t realize until fairly recently, about a year since I started working with Haskell. I’ve been working on a Javascript codebase, and noticed just how many of my bugs turn out to have been a variable that I changed somewhere along the way. It seems like most of my debugger time is spent trying to find out what the value of a variable is at some particular point in time. And, as you said, immutability is something that I can take from Haskell and use to some degree in my Javascript.

Comments are closed.