At clypd, we have been using PostgreSQL and it has been a great experience. We are impressed with its simplicity, good documentation, active community, abundance of functions and lot more. This post jots lessons learned and insights acquired during a recent journey in improving performance of one of our queries. Most resources and tools in this post point to other sites, however we think it would be helpful to list various ‘performance improvement’ resources in one page.
Our query involved 3 sub queries, each sub query performing joins over 2-3 tables; results from the sub queries are joined, aggregated and sorted. The query probably has same number of words as this post (ok…at least half). For easier reading we will group our findings in three buckets.
Early in the process we found that there were several small changes we could make to the first version of query implementation. These tips required a bit more insight into the workings of the query planner, but were pretty easy once we realized how the query was being executed:
Once we had eliminated the low-hanging fruit, we had to dig deeper into the inner workings of the Postgres query planner to really see how it was executing our query, and understand what we could do to make this job easier:
SELECT tx.xa, tx,xb, SUM(ty.ya)
FROM tx
JOIN ty on tx.id=ty.id
JOIN tz ON tx.id=tz.id
AND tx.xq = tz.zq AND tz.date between X and Y
GROUP BY tx.xa,tx.xb
ORDER BY tx.xa, tx.xb
In our case moving the tz filter into a WHERE EXISTS clause replaced MERGE JOIN with a HASH JOIN. Merges joins sort data on both sides before doing the join. Here’s the restructured query
SELECT tx.xa, tx,xb, SUM(ty.ya)
FROM tx
JOIN ty on tx.id=ty.id
WHERE EXISTS ( SELECT 1
FROM tz
WHERE tx.id=tz.id AND tz.xr = tz.zr
AND tz.date between X and Y)
GROUP BY tx.xa,tx.xb
ORDER BY tx.xa, tx.xb
Apart from tuning the queries themselves, there is definitely some performance improvement to be gained from ensuring you are getting the most from your Postgres server’s hardware. Everyone’s setup is different here, but this is how we got the most from our machines:
select name, setting from pg_settings where name = 'shared_buffers';
This is by no means a complete list of tips and tricks, simply an aggregation of resources and tips we found useful. We will post updates in this space as we learn more. Which tools do you use to get the most out of your PostgreSQL?
Wrestling databases into submission is one of many things that Sumit Shah does as Principal Engineer at clypd.