Agile software development and homotopy

One of the things I learned from my tenure as a software project manager was that a project is more likely to succeed if there’s a way to get where you want to go continuously. You want to move a project from A to B gradually, keeping a working code base all along the way. At the end of each day, the software may not be fully functional, but it should at least build. Anything that requires a big bang change, tearing the system apart for several days and putting it back together, is less likely to succeed.

This is very much like the idea of homotopy from topology, a continuous deformation of one thing into another. No discontinuities along the way — no ripping, no jumping suddenly from one thing to another.

* * *

Need help with agile software forecasting?

Group projects

The best teams have people with complementary skills, but similar work ethic. Academic assignments are the opposite. There’s not much variation in skills, in part because students haven’t yet developed specialized skills, and in part because students are in the same class because they have similar interests. The biggest variation is likely to be work ethic. It’s not uncommon for the hardest working person in a group to do 10x as much work as the laziest person in the group. The person doing most of the work learns that it’s best to avoid working with teams.

Working with people with complementary skills is a blast, but you’re unlike to experience that in an academic project. You might get some small degree specialization. Maybe one of the mechanical engineers on a project has more artistic ability than the other mechanical engineers, for example. But this is hardly like the experience of working with a team of people who are all great at different things.

 

Fair division and the Thue-Morse sequence

Suppose two captains, A and B, are choosing people for their teams. To make things fair, the two captains alternate choices: A, B, A, B, etc. This is much better than simply letting A choose his team first and leaving B the dregs, but it still gives A a substantial advantage. If each captain picks the best remaining player on each term, A will get the best player in each pair of choices.

How could we do better? One proposed strategy is the Thue-Morse sequence. Someone has to choose first, so let’s say it’s A. Then B chooses next. At this point A is still at an advantage, so we let B choose again before giving A the next pick. So the first four choices are ABBA. The next four choices reverse this: BAAB. Then the next eight choices reverse the pattern of the first eight choices. So the first 16 choices are ABBABAABBAABABBA. This process has been called “taking turns taking turns.”

In terms of 0’s and 1’s, the sequence is defined by t0 = 0, t2n = tn, and t2n+1 = 1 – t2n.

How well does this procedure work? Let’s simulate it by generating random values representing skill levels. We’ll sort the values and assign them to the two teams using the Thue-Morse sequence and look at the difference between the total point values on each team. Equivalently, we can sum the values, alternating the signs of the terms according to the Thue-Morse sequence.

    import numpy as np

    # Thue-Morse sequence
    TM  = np.array([1, -1, -1, 1])

    # Simple alternating signs
    Alt = np.array([1, -1, 1, -1])

    N = 1000000 # number of simulations

    y = TM # change y to Alt to swap out strategies
    total = 0
    for _ in range(N):
        x = sorted(np.random.random(4), reverse=True)
        total += sum(x*y)
    print(total/N)

When we use the alternating sequence, there’s an average skill difference between the two teams of around 0.4. This is a huge imbalance relative to expected total skill of 2. (Each skill is a uniform random value between 0 and 1, so the total skill of all four players is 2 on average.)

When we use the Thue-Morse sequence, the average difference in my simulation was 0.000144. With one million simulations, our results are good to about three decimal places [1], and so the difference is indistinguishable from 0 to the resolution of our simulation. (When I repeated the simulation I got -0.000635 as the average difference.)

There are several ways to explore this further. One would be to work out the exact expected values analytically. Another would be to look at distributions other than uniform.

* * *

[1] The error in the average of N simulations is on the order of 1/√N by the central limit theorem.

Positive polynomials and squares

If a real polynomial in one variable is a sum of squares, it obviously cannot be negative. For example, the polynomial

p(x) = (x2 – 3)2 + (x + 7)2

is obviously never negative for real values of x. What about the converse: If a real polynomial is never negative, is it a sum of squares? Yes, indeed it is.

What about polynomials in two variables? There the answer is no. David Hilbert (1862–1943) knew that there must be positive polynomials that are not a sum of squares, but no one produced a specific example until Theodove Motzkin in 1967. His polynomial

p(x, y) = 1 – 3x2y2 + x2 y4 + x4 y2

is never negative but cannot be written as a sum of any number of squares. Here’s a plot:

Motzkin polynomial

Source: Single Digits

 

Sparsely populated zip codes

The dormitory I lived in as an undergraduate had its own five-digit zip code at one time. It was rumored to be the largest dorm in the US, or maybe the largest west of the Mississippi, or something like that. There were about 3,000 of us living there. Although the dorm had enough people to justify its own zip code—some zip codes have far fewer people—zip code boundaries were redraw so that the dorm shares its zip code with other areas.

Some zip code are so sparsely populated that people living in these areas are relatively easy to identify if you have other data. The so-called Safe Harbor provision of HIPAA (Health Insurance Portability and Accountability Act) says that it’s usually OK to include the first three digits of someone’s zip code in de-identified data. But there are 17 areas so thinly populated that even listing the first three digits of their zip code is considered too much of an identification risk. These are areas such that the first three digits of the zip code are:

  • 036
  • 059
  • 063
  • 102
  • 203
  • 556
  • 692
  • 790
  • 821
  • 823
  • 830
  • 831
  • 878
  • 879
  • 884
  • 890
  • 893

This list could change over time. These are the regions that currently contain fewer than 20,000 people, the criterion given in the HIPAA regulations.

Knowing that someone is part of an area containing 20,000 people hardly identifies them. The concern is that in combination with other information, zip code data is more informative in these areas.

Related post: Bayesian clinical trials in one zip code

Need help with HIPAA de-identification?

Category theory and Koine Greek

Fragment of the Gospel of John in Greek

When I was in college, I sat in on a communication workshop for Latin American preachers. This was unusual since I’m neither Latin American nor a preacher, but I’m glad I was there.

I learned several things in that workshop that I’ve used ever since. For example, when you’re gesturing about something moving forward in time, move your hand from left to right from the audience’s perspective. Since English speakers (and for the audience of this workshop, Spanish speakers) read from left to right, we think of time progressing from left to right. If you see someone talking about time moving forward, but you see motion from right to left, you feel a subtle cognitive dissonance. (Presumably you should reverse this when speaking to an audience whose primary language is Hebrew or Arabic.)

Another lesson from that workshop, the one I want to focus on here, is that you don’t always need to convey how you arrived at an idea. Specifically, the leader of the workshop said that if you discover something interesting from reading the New Testament in Greek, you can usually present your point persuasively using the text in your audience’s language without appealing to Greek. This isn’t always possible—you may need to explore the meaning of a Greek word or two—but you can use Greek for your personal study without necessarily sharing it publicly. The point isn’t to hide anything, only to consider your audience. In a room full of Greek scholars, bring out the Greek.

This story came up in a recent conversation with Brent Yorgey about category theory. You might discover something via category theory but then share it without discussing category theory. If your audience is well versed in category theory, then go ahead and bring out your categories. But otherwise your audience might be bored or intimidated, as many people would be listening to an argument based on the finer points of Koine Greek grammar. Microsoft’s LINQ software, for example, was inspired by category theory principles, but you’d be hard pressed to find any reference to this because most programmers don’t want to know or need to know where it came from. They just want to know how to use it.

Some things may sound profound when expressed in esoteric language, such as category theory or Koine Greek, that don’t seem so profound in more down-to-earth language. Expressing yourself in a different language helps filter out pedantry from useful ideas. (On the other hand, some things that looked like pure pedantry have turned out to be very useful. Some hairs are worth splitting.)

Sometimes you have to introduce a new terms because there isn’t a colloquial counterpart. Monads are a good example, a concept from category theory that has entered software development. A monad is what it is, and analogies to burritos and other foods don’t really help. Better to introduce the term and say plainly what it is.

* * *

More on applied category theory

New Twitter account for functional programming and categories

I’m starting a new Twitter account @FunctorFact for functional programming and category theory.

These two subjects have a lot of overlap, and some tweets will combine both, but many will be strictly about one or the other. So some content will be strictly about programming, some strictly about math, and some combining ideas from both.

FunctorFact icon

Prime factors, phone numbers, and the normal distribution

Telephone numbers typically have around three distinct prime factors.

The length of a phone number varies by country, but US a phone number is a 10 digit number, and 10-digit numbers are often divisible by three different prime numbers, give or take a couple. Assuming phone numbers are scattered among possible 10-digit numbers in a way that doesn’t bias their number of prime factors, these numbers will often have between 1 and 5 prime factors. If a country has 9- or 11-digit phone numbers, the result is essentially the same.

Let ω(n) be the number of distinct prime factors of n. Then the Erdős–Kac theorem says roughly that ω(n) is distributed like a normal random variable with mean and variance log log n. More precisely, fix two numbers a and b.  For a given value of x, count the proportion of positive integers less than x where

(ω(n) – log log n) / sqrt( log log n)

is between a and b. Then in the limit as x goes to infinity, this proportion approaches the probability that a standard normal random variable is between a and b.

So by that heuristic, the number of distinct prime factors of a 10-digit number is approximately normally distributed with mean and variance log log 10^11 = 3.232, and such a distribution is between 1 and 5 around 73% of the time.

My business phone number, for example, is 8324228646. Obviously this is divisible by 2. In fact it equals 2 × 32 × 462457147, and so it has exactly three distinct prime factors: 2, 3, and 462457147.

Here’s how you could play with this using Python.

    from sympy.ntheory import factorint

    def omega(n):
        return len(factorint(n))

I looked in SymPy and didn’t see an implementation of ω(n) directly, but it does have a function factorint that returns the prime factors of a number, along with their multiplicities, in a dictionary. So ω(n) is just the size of that dictionary.

I took the first 20 phone numbers in my contacts and ran them through omega and got results consistent with what you’d expect from the theory above. One was prime, and none had more than five factors.

Bar chart of umber of prime factors in a sample of phone numbers with heights [1, 3, 5, 8, 3]

Five lemma, ASCII art, and Unicode

A few days ago I wrote about creating ASCII art in Emacs using ditaa. Out of curiosity, I wanted to try making the Five Lemma diagram. [1]

The examples in the ditaa site all have arrows between boxes, but you don’t have to have boxes.

Here’s the ditaa source:

A₀ ---> A₁ ---> A₂ ---> A₃ ---> A₄
|       |       |       |       |            
| f₀    | f₁    | f₂    | f₃    | f₄    
|       |       |       |       |      
v       v       v       v       v      
B₀ ---> B₁ ---> B₂ ---> B₃ ---> B₄

and here’s the image it produces:

Five lemma diagram

It’s not pretty. You could make a nicer image with LaTeX. But as the old saying goes, the remarkable thing about a dancing bear is not that it dances well but that it dances at all.

The trick to getting the subscripts is to use Unicode characters 0x208n for subscript n. As I noted at the bottom of this post, ditaa isn’t strictly limited to ASCII art. You can use Unicode characters as well. You may or may not be able to see the subscripts in the source code they are not part of the most widely supported set of characters.

* * *

[1]  The Five Lemma is a diagram-chasing result from homological algebra. It lets you infer properties the middle function f from properties of the other f‘s.

Benford’s law, chi-square, and factorials

A while back I wrote about how the leading digits of factorials follow Benford’s law. That is, if you look at just the first digit of a sequence of factorials, they are not evenly distributed. Instead, 1’s are most popular, then 2’s, etc. Specifically, the proportion of factorials starting with n is roughly log10(1 + 1/n).

Someone has proved that the limiting distribution of leading digits of factorials exactly satisfies Benford’s law. But if we didn’t know this, we might use a chi-square statistic to measure how well the empirical results match expectations. As I argued in the previous post, statistical tests don’t apply here, but they can be useful anyway in giving us a way to measure the size of the deviations from theory.

Benford’s law makes a better illustration of the chi-square test than the example of prime remainders because the bins are unevenly sized, which they’re allowed to be in general. In the prime remainder post, they were all the same size.

The original post on leading digits of factorial explains why we compute the leading digits the way we do. Only one detail has changed: the original post used Python 2 and this one uses Python 3. Integer division was the default in Python 2, but now in Python 3 we have to use // to explicitly ask for integer division, floating point division being the new default.

Here’s a plot of the distribution of the leading digits for the first 500 factorials.

And here’s code to compute the chi-square statistic:

    from math import factorial, log10

    def leading_digit_int(n):
        while n > 9:
            n = n//10
        return n

    def chisq_stat(O, E):
        return sum( [(o - e)**2/e for (o, e) in zip(O, E)] )

    # Waste the 0th slot to make the code simpler.
    digits = [0]*10

    N = 500
    for i in range(N):
        digits[ leading_digit_int( factorial(i) ) ] += 1

    expected = [ N*log10(1 + 1/n) for n in range(1, 10) ]

    print( chisq_stat(digits[1:], expected) )

This gives a chi-square statistic of 7.693, very near the mean value of 8 for a chi-square distribution with eight degrees of freedom. (There are eight degrees of freedom, not nine, because if we know how many factorials start with the digits 1 through 8, we know how many start with 9.)

So the chi-square statistic suggests that the deviation from Benford’s law is just what we’d expect from random data following Benford’s law. And as we said before, this suggestion turns out to be correct.

Related posts:

Hypothesis testing and number theory

This post uses a hypothesis test for proportions to look at a couple conjectures in number theory. It is similar to my earlier post on the chi-square test and prime remainders. You could read this as a post on statistics or a post on number theory, depending on which you’re less familiar with.

Using statistical tests on number theory problems is kind of odd. There’s nothing random going on, so in that since the whole enterprise is unjustified. Nevertheless, statistical tests can be suggestive. They certainly don’t prove theorems, but they can give reason to suspect a theorem is true or false. In that sense, applying statistical tests to number theory isn’t all that different from applying them to more traditional settings.

First we’ll look at the remainders of primes modulo 4. Except for 2, all primes are odd, and so they either have remainder 1 or 3 when divided by 4. Brian Hayes wrote recently that Chebyshev noticed in the 1850’s that there seems to be more primes with remainder 3. Is the imbalance larger than one would expect to see from fair coin tosses?

Here’s some Python code to find the proportion of the first million primes (after 2) that have remainder 3 when divided by 4.

    from sympy import prime
    from scipy import sqrt

    n = 1000000
    rem3 = 0
    for i in range (2, n+2):
        if prime(i) % 4 == 3:
            rem3 += 1
    p_hat = rem3/n

This shows that of the first million odd primes, 500,202 are congruent to 3 mod 4. Would it be unusual for a coin to come up heads this many times in a million flips? To find out we’d compute the z-statistic:

z = \frac{\hat{p} - p}{\sqrt{pq/n}}

Here p is the proportion under the null hypothesis, q = 1 – p, and n is the sample size. In our case, the null hypothesis is p = 0.5 and n = 1,000,000. [1]

The code

    p = 0.5
    q = 1 - p
    z = (p_hat - p)/sqrt(p*q/n)

shows that z = 0.404, hardly a suspiciously large value. If we were looking at random values we’d see a z-score this large or larger 34% of the time. So in this case doesn’t suggest much.

Click to learn more about Bayesian statistics consulting

 
[1] The derivation of the z statistic is fairly quick. If the proportion of successes is p, then the number of successes out of n trials is binomial(np). For large n, this is has approximately the same distribution as a normal distribution with the same mean and variance, mean np and variance npq. The proportion of successes then has approximately mean p and standard deviation √(pq/n). Subtracting the mean and dividing by the standard deviation normalizes the distribution to have mean 0 and variance 1. So under the null hypothesis, the z statistic has a standard normal distribution.

ASCII art diagrams in Emacs org-mode

Yesterday I wrote about ASCII art diagrams and gave four reasons you might want to use this ancient approach to creating simple diagrams:

  • It could be quicker than creating a graphical image .
  • You can paste them into plain text documents like source code files.
  • They can be produced programmatically.
  • There is software to turn ASCII art into more polished images.

Today I’ll post a few notes about how to create graphical versions of ASCII diagrams in Emacs with org-mode.

Running code inside org-mode

You can embed and execute source code in org-mode files. I wrote a couple posts about this, one showing how to run Python and R inside org-mode and another showing how to mix languages in org-mode. The latter shows Perl calling R and Python, all in 14 lines of code.

There are currently 39 programming languages that org-mode can call by default. In addition to conventional programming languages like the ones listed above, org-mode also supports ditaa, the language that treats ASCII art as a specification language to produce graphics.

You can edit code blocks just as you would other text in an org-mode file. But if you’d like to edit a code block in its language’s mode, type C-c ' from inside the code block. If you’re editing Python code, for example, this will open a new window, in Python mode, containing just that code block. If you type C-c ' inside a ditaa code block, Emacs opens a new window in “artist mode,” a mode specifically for editing ASCII art.

You can run code inside org-mode two ways: interactively and for export. With your cursor inside a block of code, type C-c C-c to execute the code and report the results. You can also export an entire org-mode document and have the results of code execution embedded in the final document. This works much the same as “*weave” projects like Sweave, Pweave, and Hweave. But while each of these is specific to a particular programming language (R, Python, and Haskell respectively), org-mode works with dozens of languages, including ditaa.

Running ditaa inside org-mode

You embed ditaa code just like you’d embed any other code. In the first post mentioned above, I gave this example of calling R:

    #+begin_src R
    sqrt(42)
    #+end_src

Here’s the analogous code for ditaa:

    #+BEGIN_SRC ditaa :file foo.png
    +-------+
    | Hello |
    +-------+
    #+END_SRC

The markers to begin and end a source code segment can be upper or lower case. I used lower case in my previous post, but it may be more readable to use upper case so that the markers stand out better from their arguments.

The R example above didn’t use any header arguments, though it could have. With ditaa, you must provide a header argument: the name of the file to save the graphics in.

If you run the ditaa code by navigating inside the code block and running C-c C-c, Emacs will add a couple lines after the code block:

    #+RESULTS:
    [[file:foo.png]]

This is the literal text, what you’d see if you opened the file in another editor. But org-mode uses the double brackets for links. You wouldn’t see the double brackets. Instead you’d see a hyperlink with text file:foo.png. Clicking on that link opens the image.

Hello image produce by ditaa

You can export the org-mode file with the command C-c C-e. This brings up a menu of export options: h for HTML, l for LaTeX, etc. Within each of these are further options. For example, you could type l for LaTeX and then type l again to select export to LaTeX or type p to have have org-mode run LaTeX on the file and produce a PDF. If you know you want a PDF, you could do this all in one command: C-c C-l l p.

You can control whether org-mode exports code or the results of running code (or both or neither). Org-mode exports the results of ditaa code, i.e. graphics files, by default. This makes sense: your PDF file will have a nice image version of your diagram, not the ASCII art used as its source code.

Configuration and troubleshooting

By default, the only programming language you can run inside org-mode is Emacs Lisp. This makes sense. You want to be deliberate about what code you run, but if you don’t want to run Emacs Lisp you’d better not run Emacs!

Inside your Emacs configuration file, you specify what languages org-mode is allowed to run. Here’s an example allowing Python and ditaa:

    
    (org-babel-do-load-languages 'org-babel-load-languages '(
        (python . t) 
        (ditaa . t))
    )

Recent versions of Emacs are supposed to ship with ditaa, but the ditaa jar file was missing from the two computers I experimented with. Emacs complained

    org-babel-execute:ditaa: Could not find ditaa.jar at ...

and so I copied ditaa.jar to the place where Emacs said it was looking for it. That worked, but it’s kind of a kludge because now I had two copies of the jar file. A better solution is to tell Emacs where you already have ditaa.

I like to use the exact same init.el file on every machine I use and so I added a section where I put OS-specific and machine-specific configuration. Here I put a different path to ditaa for each OS.

    ;;----------------------------------------------------------------------
    ;; OS specific
    
    (cond
        ((string-equal system-type "windows-nt") ; Microsoft Windows
    
            (progn
                (setq-default ispell-program-name "C:/bin/Aspell/bin/aspell.exe") 
                # etc.
                (setq org-ditaa-jar-path "c:/bin/ditaa/ditaa.jar")
            )
        )
        ((string-equal system-type "gnu/linux") ; Linux
            (progn
                (setq x-select-enable-clipboard t)
                 # etc.
                (setq org-ditaa-jar-path "/usr/bin/ditaa")
            )
        )
    )

ASCII art diagrams

“Technology is additive.” — Kevin Kelly

Old technologies never die. Instead, their range of application shrinks. Or maybe it grows when conditions change.

ASCII art, drawing pictures with fixed-width plain text characters, is no longer how many people want to produce diagrams. Just fire up Adobe Illustrator and you get incomparably more resolution of expression.

And yet there are times when ASCII art comes in handy. You can, for example, paste it into source code files. Someone more familiar with Emacs than Illustrator may be able to produce a simple diagram in the former faster than the latter. And it can be relatively easy to programmatically produce a large number of ASCII art diagrams, depending on the nature of the diagrams.

It’s also possible to use ASCII art as a way to specify nicely rendered images. I’ll show how to do that with ditaa below.

Here’s an ASCII version of the conjugate prior diagram I made some time ago:

 

                          +-------------+
                          |             |
                          | Exponential |
                          |             |
                          +-------------+
                                 |
                          lambda |                                                          
                                 v
+-------------+           +-------------+           +-------------+
|             |   tau     |             |   lambda  |             |
|  Lognormal  |---------->|    Gamma    |<----------|   Poisson   |
|             |           |             |---+       |             |
+-------------+           +-------------+   |       +-------------+
      |                          ^   ^      | beta
      |                          |   |      |
      |                          |   +------+
      |                      tau |           
      |                          |           
      |                   +-------------+   
      |        mu         |             |   
      +------------------>|    Normal   |
                          |             |----+
                          +-------------+    | 
                                     ^       | mu
                                     |       |
                                     +-------+

And here’s the image produced by ditaa processing the ASCII diagram above:

Conjugate prior diagram produced by ditaa

Update: See my next post on how to create ASCII art diagrams and their graphic version from ditaa using Emacs org mode.

Update: When I first made the diagram above, I tried using Greek letters, e.g. using β rather than “beta,” but this didn’t work. I thought “OK, I suppose it’s not really ASCII art if you use non-ASCII characters.” But someone told me Unicode characters worked for him, so I tried again when I wrote the follow up post and it worked.

My first attempt, from a Windows laptop, calling ditaa from the command line, did not work. My second attempt, running inside org-mode from a Windows desktop, did work. My third attempt, running from Emacs on Linux also worked.

ditaa diagram using Unicode symbols

Interview with Chris Toomey of Upcase

The other day I spoke to Chris Toomey from thoughtbot. Chris runs Upcase, thoughtbot’s online platform for learning about Rails, test-driven development, clean code, and more. I was curious about his work with Ruby on Rails since I know little about that world. And at a little deeper level, I wanted to get his thoughts on how programming languages are used in practice, static vs dynamic, strongly typed vs weakly typed, etc.

Chirs Toomey

JC: Chris, I know you do a lot of work with Ruby on Rails. What do you think of Ruby without Rails? Would you be as interested in Ruby if the Rails framework had been written in some other language?

CT: Let me back up a little bit and give you some of my background. I started out as an engineer and I used VB because it was what I had for the task at hand. Then when I decided to buckle down and become a real developer I chose Python because it seemed like the most engineering-oriented alternative. It seemed less like an enterprise language, more small and nimble. I chose Python over Ruby because of my engineering background. Python seemed more serious, while Ruby seemed more like a hipster language. Ruby sounded frivolous, but I kept hearing good things about it, especially with Rails. So like a lot of people I came to Ruby through Rails. It was the functionality and ease of use that got me hooked, but I do love Ruby as a language, the beauty and expressiveness of it. It reads more like prose than other languages. It’s designed for people rather than machines. But it’s also a very large language and hard to parse because of that. Over time though I’ve seen people abuse the looseness, the freedom in Ruby, and that’s caused me to look at stricter options like Haskell and other functional languages.

JC: I only looked at Ruby briefly, and when I saw the relative number of numerical libraries for Python and Ruby I thought “Well, looks like it’s Python for me.”

It seems like Ruby bears some resemblance to Perl, for better or worse.

CT: Absolutely. Ruby has two spiritual ancestors. One is Perl and the other is Smalltalk. I think both of those are great, and many of the things I love about Ruby come from that lineage. Perl contributed the get-things-done attitude, the looseness and terseness, the freedom to interact at any level of abstraction.

It’s kinda odd. I keep coming back to The Zen of Python. One of the things it says is that explicit is better than implicit, and I really think that’s true. And yet I work in Ruby and Rails where implicit is the name of the game. So I have some cognitive dissonance over that. I love Ruby on Rails, but I’m starting to look at other languages and frameworks to see if something else might fit as well.

JC: Do you have the freedom to choose what language and framework you work in? Do clients just ask for a web site, or do they dictate the technology?

CT: We have a mix. A lot of clients just want a web app, but some, especially large companies, want us to use their technology stack. So while we do a lot of Rails, we also do some Python, Haskell, etc.

JC: Do you do everything soup-to-nuts or do you have some specialization?

CT: We have three roles at thoughtbot: designer, web developer, and mobile developer. The designers might do some JavaScript, but they mostly focused on user experience, testing, and design.

JC: How do you keep everything straight? The most intimidating thing to me about web development is all the diverse tools in play: the language for your logic, JavaScript, CSS, HTML, SQL, etc.

CT: There’s definitely some of that, but we outsource some parts of the stack. We host applications on Heroku, giving them responsibility for platform management. They run on top of AWS so they handle all the scaling issues so we can focus on the code. We’ll deploy to other environments if our client insists, but our preference is to go with Heroku.

Similarly, Rails has a lot of functionality for the database layer, so we don’t write a lot of SQL by hand. We’re all knowledgeable of SQL, but we’re not DBA-level experts. We scale up on that as necessary, but we want to focus on the application.

JC: Shifting gears a little bit, how do you program differently in a dynamic language like Ruby than you would in a stricter language like C++? And is that a good thing?

CT: One thing about Ruby, and dynamic languages in general, is that testing becomes all the more critical. There are a lot of potential runtime errors you have to test for. Whereas with something like Haskell you can program a lot of your logic into the type system. Ruby lets you work more freely, but Haskell leads to more robust applications. Some of our internal software at thoughtbot is written in Haskell.

JC: I was excited about using Haskell, but when I used it on a production project I ran into a lot of frustrations that you wouldn’t anticipate from working with Haskell in the small.

CT: Haskell does seem to have a more aggressive learning curve than other languages. There’s a lot of Academia in it, and in a way that’s good. The language hasn’t compromised its vision, and it’s been able to really develop some things thoroughly. But it also has a kind of academic heaviness to it.

There’s a language out there called Elm that’s inspired by Haskell and the whole ML family of languages that compiles down to JavaScript. It presents a friendlier interface to the whole type-driven, functional way of thinking. The developers of the language have put a lot of effort into making it approachable, without having to understand comonads and functors and all that.

JC: My difficulties with Haskell weren’t the theory but things like the lack of tooling and above all the difficulty of package management.

CT: Cabal Hell.

JC: Right.

CT: My understanding is that that’s improved dramatically with new technologies like Stack. We’re scaling up internally on Haskell. That’s the next area we’d like to get into. I’ll be able to say more about that down the road.

* * *

Check out Upcase for training materials on tools like Vim, Git, Rails, and Tmux.