I was looking at ways to squeeze a bit more performance out of a query yesterday, and came accross this:

http://www.mysqlperformanceblog.com/2007/04/06/using-delayed-join-to-optimize-count-and-limit-queries/

Take this example (structure doesn’t really matter):

SELECT *
FROM table1
LEFT JOIN table2 ON table1.fk=table2.pk
LEFT JOIN table3 ON table1.fk2=table3.id
WHERE table1.field=’123′ |
ORDER BY table1.field2
LIMIT 20, 30

If you’ve got a lot of joined records in table2 or 3, then the number of rows in the temporary table that mysql makes for the join can be massive.

In the example I was looking at, it was generating over half a million rows in the temporary table for a query like this. The dataset is fairly large, has more than two tables joined – i used 2 above to keep it simple, and in some cases a large number of linked rows from table2/table3 (which multiply up the row count in the join very quickly!). This meant the temporary table had to be created on disk, rather than in memory, and also didn’t cache. Result was 15 second queries running frequently, causing long page loads and at times loading the server up and slowing down other areas of the site too (I think!).

But MySQL doesn’t have to do this – the WHERE and ORDER BY clauses apply only to table1 (if they reference other tables, then you will probably need to do the joins before you can do the WHERE/ORDER BY/LIMIT) – so it should be possible for the query optimiser to work out that it can apply these to table1, along with the limit clause too, and then join on the extra tables to just the resulting rows. But the query optimiser doesn’t do this, it seems to perform the joins earlier in the process – so the temporary table includes rows that get excluded by the limit clause before the result is returned (From my tests, I think applying the limit is the key – the order by and where don’t impact performance massively, but have to be done before the limit, so need to go into the subquery too).

To make MySQL take the more efficient path outlined above, you can adjust the query as follows:

SELECT *
FROM
(SELECT * FROM table1 WHERE table1.field=’123′ ORDER BY table1.field2 LIMIT 20, 30) as table1
LEFT JOIN table2 ON table1.fk=table2.pk
LEFT JOIN table3 ON table1.fk2=table3.id
ORDER BY table1.field2

In short, what is happening is we are using a subquery to force mysql to select and limit the rows from table1, before the joins are done. This dramatically reduced the time to execute the query compared to the original, and as an added bonus, the queries cache. Execution time comes down from 10 – 15 seconds to under half a second, and then <0.001 seconds from cache.

Your mileage may vary – it all depends on how many rows are getting pulled into your temporary table – which in turn depends on the nature (and quantity) of data you are working with.