Avoid DISTINCT postgres queries
AWS published a blog post on optimizing RDS postgres queries. It contains one bit of advice that I keep revising quite often. Since I have trouble finding that post each time, I quoted the advice here separately.
The following is an excerpt from Optimizing and tuning queries in Amazon RDS PostgreSQL based on native and external tools by Baji Shaik and Domenico di Salvia.
Most of the developers use DISTINCT
when using joins as below:
SELECT DISTINCT d.dept_no,
d.dept_name
FROM departments d,
dept_emp e
WHERE e.dept_no = d.dept_no;
When tables involved in the query have one-to-many relationship, use EXISTS instead of DISTINCT as below. It gives better performance:
SELECT d.dept_no,
d.dept_name
FROM departments d
WHERE EXISTS (SELECT 'X'
FROM dept_emp e
WHERE e.dept_no = d.dept_no);