SQLite: Dropping an index for a 300x speedupPublished on:
For a small, personal project I use sqlite as a time series database and make queries like the following:
SELECT referer, COUNT(*) AS views FROM logs WHERE host = 'comments.nbsoftsolutions.com' AND epoch >= 1551630947 AND epoch < 1551632947 GROUP BY referer ORDER BY views DESC
Over time, I’ve noticed that this query has been increasingly time consuming, sometimes taking minutes to complete. I thought I created proper indices:
CREATE index idx_epoch on logs(epoch); CREATE index idx_host ON logs(host);
ANALYZE, and still no changes. I was at my wits end, seriously thought about looking into another database. Then I enabled sqlite’s
EXPLAIN QUERY PLAN and saw the following output.
0|0|0|SEARCH TABLE logs USING INDEX idx_host (host=?) 0|0|0|USE TEMP B-TREE FOR GROUP BY 0|0|0|USE TEMP B-TREE FOR ORDER BY
Omitting the epoch timestamp index from a time series table is a red flag. SQLite can only use one index from each table and it was choosing the wrong one.
After crawling sqlite docs, I found a way to disable the host index using the unary “+” operator.
Now our query looks like:
SELECT referer, COUNT(*) AS views FROM logs WHERE +host = 'comments.nbsoftsolutions.com' AND epoch >= 1551630947 AND epoch < 1551632947 GROUP BY referer ORDER BY views DESC
And the new query plan picks up on our hint:
EARCH TABLE logs USING INDEX idx_epoch (epoch>? AND epoch<?) 0|0|0|USE TEMP B-TREE FOR GROUP BY 0|0|0|USE TEMP B-TREE FOR ORDER BY
Execution time decrease from 30 seconds to 0.1 second, a 300x speedup by dropping the
host index from consideration.
Being able to write raw sql like this is a major reason why I tend to have a disdain for ORMs that abstract everything away. Sometimes we need to go to a lower level.
Let’s see if we can’t tease out why sqlite naively makes the wrong decision. From the docs on choosing from multiple indices:
sqlite_stat1table might indicate that an equality constraint on column x reduces the search space to 10 rows on average, whereas an equality constraint on column y reduces the search space to 3 rows on average
The contents of
sqlite_stat1 in my scenario:
logs|idx_host|3165918 7685 logs|idx_epoch|3165918 2
This table states that an equality constraint on
host reduces the search space to an average of 7685, and an average of 2 for
epoch. Considering that our epoch range is [1551630947, 1551632947), a 2000 difference, I would have expected that sqlite would have realized that (2 * 2000 < 7685). Even updating the estimate that
idx_host index narrows the results down to 10 million rows changes nothing:
UPDATE sqlite_stat1 SET stat = '3165918 9999999' WHERE idx = 'idx_host';
Thus we can conclude that SQLite will always prefer an equality constraint in index evaluations versus any range constraint. So update your queries or drop offending indices if you have to.
If this is not true, or anyone has new / differing information, feel free to comment. This is with sqlite 3.22.0.
If you'd like to leave a comment, please email email@example.com