How to order and partition by multiple columns?
I am using PostgreSQL.
I have a table with 2 columns in this example:
I want to add a new column with a unique id corresponding to partitions by name and category as shown in the result. Then, I want to take a random sample choosing 2 (or more) unique ids because under each unique id, there will be a lot of other historical data.
I have tried this so far, but I get 1s for everything. I'm missing something really simple here, but how do I correct my mistake? I have to do this operation for several million rows in the real table.
SELECT
dense_rank() over (partition by name, category order by category) as unique_id,
*
FROM
example_table
After this, presumably I'll have to use RAND() somewhere but how do I do this?
This is my naive approach to get the solution in the above pic.
with ranks (
...
)
select * from ranks where unique_id = 2 or unique_id = 4
Use ROW_NUMBER() function better for order and partition by multiple columns.
SELECT
name,
category,
ROW_NUMBER() OVER (PARTITION BY name, category ORDER BY name, category) unique_id
FROM
test;
name | category | unique_id
:--- | :------- | --------:
A | Alpha | 1
A | Alpha | 2
A | Alpha | 3
A | Beta | 1
A | Beta | 2
B | Alpha | 1
B | Alpha | 2
B | Theta | 1
db<>fiddle here
To get a unique_id for each pair of name-category, remove the PARTITION part and use only ORDER BY:
SELECT
name,
category,
DENSE_RANK() OVER (ORDER BY name, category) unique_id
FROM
test;
name | category | unique_id
:--- | :------- | --------:
A | Alpha | 1
A | Alpha | 1
A | Alpha | 1
A | Beta | 2
A | Beta | 2
B | Alpha | 3
B | Alpha | 3
B | Theta | 4
db<>fiddle here
To select two rows of each unique_id: (As far as I've not used any order it is supposed to be random)
WITH ct AS
(
SELECT
name,
category,
DENSE_RANK() OVER (ORDER BY name, category) unique_id
FROM
test
)
SELECT
a.unique_id, c.name, c.category
FROM
(SELECT DISTINCT unique_id
FROM ct) a
JOIN LATERAL (SELECT b.name, b.category
FROM ct b
WHERE b.unique_id = a.unique_id
LIMIT 2) c ON TRUE
ORDER BY
a.unique_id;
unique_id | name | category
--------: | :--- | :-------
1 | A | Alpha
1 | A | Alpha
2 | A | Beta
2 | A | Beta
3 | B | Alpha
3 | B | Alpha
4 | B | Theta