Saving money on big queries

server room

I was surprised the first time a client told me that a query would cost them $100,000 to run. If you think about querying a database on your laptop, a long query would take a minute, and what’s the cost of a minute’s worth of electricity? Too small to meter.

But some of my clients have a LOT of data. (Some also come to me with spreadsheets of data.) If you have billions of records hosted in a cloud database like Snowflake, the cost of a query isn’t negligible, especially if it involves complex joins.

There are three ways to reduce the cost of expensive queries.

First of all, it may be possible to solve a different way the problem you’re trying to solve with the expensive query. Maybe the query is the most direct approach but there’s a more economical indirect approach.

Second, it may be possible to rewrite the query into a logically equivalent query that runs faster.

Third, it may be possible to save a tremendous amount of money by tolerating a small probability of error. For example, you may be able to use reservoir sampling rather than an exhaustive query.

This last approach is often misunderstood. Why would you tolerate any chance of error when you could have the exact answer? One reason is that the exact answer might cost 1000x as much to obtain. More importantly, the “exact” result may be the exact answer to a query that is trying estimate something else. The probability of error induced by random sampling may be small relative to the probability of error intrinsic in the problem being solved.

If you’d like me to take a look at how I could reduce your query costs, let’s talk.

Related posts