How to order and partition by multiple columns?

16.7K    Asked by DavidEDWARDS in SQL Server , Asked on Mar 20, 2023

I am using PostgreSQL.

I have a table with 2 columns in this example:

I want to add a new column with a unique id corresponding to partitions by name and category as shown in the result. Then, I want to take a random sample choosing 2 (or more) unique ids because under each unique id, there will be a lot of other historical data.


I have tried this so far, but I get 1s for everything. I'm missing something really simple here, but how do I correct my mistake? I have to do this operation for several million rows in the real table.


SELECT
  dense_rank() over (partition by name, category order by category) as unique_id,
  *
FROM
 example_table
After this, presumably I'll have to use RAND() somewhere but how do I do this?

This is my naive approach to get the solution in the above pic.

with ranks (
  ...
)
select * from ranks where unique_id = 2 or unique_id = 4
Answered by David EDWARDS

Use ROW_NUMBER() function better for order and partition by multiple columns.


SELECT
    name,
    category,
    ROW_NUMBER() OVER (PARTITION BY name, category ORDER BY name, category) unique_id
FROM
    test;
name | category | unique_id
:--- | :------- | --------:
A | Alpha | 1
A | Alpha | 2
A | Alpha | 3
A | Beta | 1
A | Beta | 2
B | Alpha | 1
B | Alpha | 2
B | Theta | 1
db<>fiddle here
To get a unique_id for each pair of name-category, remove the PARTITION part and use only ORDER BY:
SELECT
    name,
    category,
    DENSE_RANK() OVER (ORDER BY name, category) unique_id
FROM
    test;
name | category | unique_id
:--- | :------- | --------:
A | Alpha | 1
A | Alpha | 1
A | Alpha | 1
A | Beta | 2
A | Beta | 2
B | Alpha | 3
B | Alpha | 3
B | Theta | 4
db<>fiddle here
To select two rows of each unique_id: (As far as I've not used any order it is supposed to be random)
WITH ct AS
(
  SELECT
      name,
      category,
      DENSE_RANK() OVER (ORDER BY name, category) unique_id
  FROM
      test
)
SELECT
    a.unique_id, c.name, c.category
FROM
    (SELECT DISTINCT unique_id
     FROM ct) a
JOIN LATERAL (SELECT b.name, b.category
              FROM ct b
              WHERE b.unique_id = a.unique_id
              LIMIT 2) c ON TRUE
ORDER BY
    a.unique_id;
unique_id | name | category
--------: | :--- | :-------
        1 | A | Alpha
        1 | A | Alpha
        2 | A | Beta
        2 | A | Beta
        3 | B | Alpha
        3 | B | Alpha
        4 | B | Theta


Your Answer

Answers (2)

If you’re wondering how to order and partition by multiple columns in SQL, it’s actually quite straightforward. Here’s a breakdown of how you can do it:

Understand the PARTITION BY Clause:

  • The PARTITION BY clause is used to divide the result set into groups (partitions) based on one or more columns. Each partition is treated independently for aggregate functions or window calculations.

Use ORDER BY Inside Window Functions:

  • The ORDER BY clause defines the order of rows within each partition. This is typically used for ranking functions (ROW_NUMBER(), RANK(), etc.) or cumulative calculations.

Syntax for Multiple Columns:

  • You can specify multiple columns in both PARTITION BY and ORDER BY. For example:

SELECT 
    column1,
    column2,
    column3,
    ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY column3 DESC) AS row_num
FROM table_name;

In this example:

  • Rows are grouped into partitions based on column1 and column2.
  • Within each partition, rows are ordered by column3 in descending order.

Best Practices:

  • Ensure the columns in PARTITION BY make sense for your grouping logic.
  • Be explicit with the ORDER BY direction (ASC or DESC) to avoid confusion.

When to Use:

  • Use this when you need calculations or rankings within subsets of your data, like finding the latest transaction per customer or ranking products by sales within categories.

By combining PARTITION BY and ORDER BY effectively, you can handle complex data analysis tasks with ease! Let me know if you’d like an example tailored to your specific use case.

1 Day

To order and partition by multiple columns, you can use the ORDER BY and PARTITION BY clauses in your SQL query. Here's how you can do it:

SELECT column1, column2, column3
FROM your_table
ORDER BY column1, column2
PARTITION BY column3;

In this example, column1 and column2 are used for sorting the rows, and column3 is used for partitioning the result set. This means that within each partition defined by the values of column3, the rows will be ordered based on the values of column1 and column2.









8 Months

Interviews

Parent Categories