How to select distinct on one column & any in another column?

2.7K    Asked by GillianHamer in SQL Server , Asked on Oct 3, 2022

 I need to query an SQL database to find all distinct values of one column and I need an arbitrary value from another column. For example, consider the following table with two columns, key and value:

key     value

===     =====

one     test

one     another

one     value

two     goes

two     here

two     also

three   example

I wish to get back one sample row, chosen arbitrarily, from each distinct key, perhaps getting these three rows:

key     value

===     =====

one     test

two     goes

three   example

How can I formulate such a query in SQL?


Answered by Gloria Holl

To select distinct on one column & any in another column -


The easiest query to write is for MySQL (with not strict ANSI settings). It uses the non-standard construction:

SELECT key, value

FROM tableX

GROUP BY key ;

In recent version (5.7 and 8.0+) where the strict settings and ONLY_FULL_GROUP_BY are the default, you can use the ANY_VALUE() function, added in 5.7:

SELECT key, ANY_VALUE(value) AS value

FROM tableX

GROUP BY key ;

For other DBMSs that have window functions (like Postgres, SQL-Server, Oracle, DB2), you can use them like this. The advantage is that you can select other columns in the result as well (besides the key and value) :

SELECT key, value

FROM tableX
    ( SELECT key, value,
             ROW_NUMBER() OVER (PARTITION BY key
                                ORDER BY whatever) --- ORDER BY NULL
               AS rn --- for example
      FROM tableX
    ) tmp

WHERE rn = 1 ;

For older versions of the above and for any other DBMS, a general way that works almost everywhere. One disadvantage is that you cannot select other columns with this approach. Another is that aggregate functions like MIN() and MAX() do not work with some data types in some DBMSs (like bit, text, blobs):

SELECT key, MIN(value) AS value

FROM tableX

GROUP BY key ;

PostgreSQL has a special non-standard DISTINCT ON operator that can also be used. The optional ORDER BY is for selecting which row from every group should be selected:

SELECT DISTINCT ON (key) key, value

FROM tableX

-- ORDER BY key, ;



Your Answer

Answer (1)

To select distinct values based on one column while allowing any value in another column in SQL, you can use the DISTINCT ON clause (specific to PostgreSQL) or use a GROUP BY with window functions for more flexibility across different SQL databases.

Here are approaches for PostgreSQL and for databases that support standard SQL:

PostgreSQL: Using DISTINCT ON

PostgreSQL has a convenient DISTINCT ON clause which makes this task straightforward.

  SELECT DISTINCT ON (column1) column1, column2

FROM your_table

ORDER BY column1, some_other_column;

column1 is the column you want distinct values for.

column2 can be any value associated with the distinct column1.

some_other_column is used to determine which row to choose for column2 when there are multiple rows with the same column1.

Example:

Suppose you have a table employees with columns department and employee_name, and you want distinct departments with any employee name.

SELECT DISTINCT ON (department) department, employee_name

FROM employees

ORDER BY department, employee_name;

Standard SQL: Using GROUP BY with Window Functions

If you're using a database that doesn't support DISTINCT ON, you can achieve similar results using window functions.

WITH ranked AS (
    SELECT column1, column2,
           ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY some_other_column) as rn
    FROM your_table
)

SELECT column1, column2

FROM ranked
WHERE rn = 1;

Example:

Using the same employees table:

WITH ranked AS (
    SELECT department, employee_name,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY employee_name) as rn
    FROM employees
)

SELECT department, employee_name

FROM ranked
WHERE rn = 1;

Explanation:

Window Function: ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY some_other_column) assigns a unique number to each row within each partition of column1 based on the order specified by some_other_column.

Common Table Expression (CTE): The CTE named ranked includes these row numbers.

Final Selection: The outer query selects rows where the row number is 1, ensuring that only the first occurrence of each column1 is selected.

By using these methods, you can effectively select distinct values from one column while allowing flexibility in another column across different SQL databases.







8 Months

Interviews

Parent Categories