Data swamps


I recently heard the term data swamp, a humorous take on data lakes. I thought about data swamps yesterday when I hiked past the literal swamp in the photo above.

Swamps are a better metaphor than lakes for heterogeneous data collections because a lake is a homogeneous body of water. What makes a swamp a swamp is its heterogeneity.

The term “data lake” may bring up images of clear water, maybe an alpine lake like Lake Tahoe straddling California and Nevada. “Data swamp” makes me think of Louisiana’s Atchafalaya Basin, which is probably a more appropriate image for most data lakes.

One thought on “Data swamps

  1. Spot on.

    … Interesting, the metaphors that get attracted to this stuff.

    “Data has gravity” is the trendy way of saying “where I store a bunch of one type of data is where I’m likely to store another bunch of data.”

    And then some marketing genius comes by and says “hey with all that data in one place you oughta do something with it…” and it’s off to the races.

