How to Make a Query 280,000 Times Faster

ProcoreDev, Jake Sheehy

01 Dec 2020 • 1 min read

We usually make small iterative improvements in performance. However, on rare occasions, an opportunity arises to make an improvement of five orders of magnitude by changing a single line!

After learning about a slow query that was hitting our statement timeout from one of our Site Reliability Engineers, I reformatted it for ease of readability and ran it through EXPLAIN ANALYZE. The query was about 80 lines long and took me a while to look through. In the middle of the query, there was a curious join:

...
INNER JOIN table2 ON table1.table2_id = 123456
...

I thought to myself: I bet that doesn't do what they think it does.

Notice that this doesn't specify a column on the right side to join with. I thought this would produce a partial Cartesian product, but I always like to verify such things.

So, I decided to take a closer look. The query on the left joins in the same way the original query does. The query on the right specifies a normal join condition plus an additional WHERE clause. Please, note the result set: ~66 million on the left vs. 48 on the right.

Original JOIN	JOIN w/ Right Column Specified
SELECT table1.id, table2.id FROM table1 INNER JOIN table2 ON table1.table2_id = 123456;	SELECT table1.id, table2.id FROM table1 INNER JOIN table2 ON table1.table2_id = table2.id WHERE table2.id = 123456;
Rows returned: 66,408,624	Rows returned: 48

Original JOIN

JOIN w/ Right Column Specified

SELECT 
  table1.id,
  table2.id
FROM
  table1
  INNER JOIN table2 
    ON table1.table2_id = 123456;

SELECT
  table1.id,
  table2.id
FROM
  table1
  INNER JOIN table2 
    ON table1.table2_id = table2.id
WHERE
  table2.id = 123456;

Rows returned: 66,408,624

Rows returned: 48

Without the right-side column specified, we end up with the Cartesian product of all rows in table1 where table2_id is 123456, and every row in table2. By specifying the right-side column and moving the condition to the WHERE clause, we get the expected result set.

So, if we bring that back to our original query, we find quite a decent improvement!

Original Query Run Time	New Query Run Time
5588997.823 ms (01:33:08.988)	19.586 ms

You see, the mistake is pretty simple. By thinking, “Hey, if I know the id, I can just slap that into the join,” you just end up with that Cartesian product and some sad Database Administrators without a right-side column to join against.

As exciting as this was to work on, let’s not allow a problem of this size to slip through code reviews in the future. We created a danger rule (https://danger.systems/ruby/) alert as guardrails to enable teams to spot similar issues.

Sign up for more like this.