Avoid DISTINCT postgres queries

AWS published a blog post on optimizing RDS postgres queries. It contains one bit of advice that I keep revising quite often. Since I have trouble finding that post each time, I quoted the advice here separately.

The following is an excerpt from Optimizing and tuning queries in Amazon RDS PostgreSQL based on native and external tools by Baji Shaik and Domenico di Salvia.


Most of the developers use DISTINCT when using joins as below:

SELECT DISTINCT d.dept_no, 
                d.dept_name 
FROM   departments d, 
       dept_emp e 
WHERE  e.dept_no = d.dept_no; 

When tables involved in the query have one-to-many relationship, use EXISTS instead of DISTINCT as below. It gives better performance:

SELECT d.dept_no, 
       d.dept_name 
FROM   departments d 
WHERE  EXISTS (SELECT 'X' 
               FROM   dept_emp e 
               WHERE  e.dept_no = d.dept_no); 


Date
July 30, 2024