What is the best way to order by random?

653    Asked by ankur_3579 in SQL Server , Asked on Oct 6, 2022

I have a query where I want the resulting records to be ordered randomly. It uses a clustered index, so if I do not include an order by it will likely return records in the order of that index. How can I ensure a random row order?


I understand that it will likely not be "truly" random, pseudo-random is good enough for my needs.

Answered by Ankesh Kumar

Many tables have a relatively dense (few missing values) indexed numeric ID column. This allows us to determine the range of existing values, and choose rows using randomly-generated ID values in that range. This works best when the number of rows to be returned is relatively small, and the range of ID values is densely populated (so the chance of generating a missing value is small enough).

  To illustrate order by random, the following code chooses 100 distinct random users from the Stack Overflow table of users, which has 8,123,937 rows.

The first step is to determine the range of ID values, an efficient operation due to the index:

DECLARE 
    @MinID integer,
    @Range integer,
    @Rows bigint = 100;
--- Find the range of values
SELECT
    @MinID = MIN(U.Id),
    @Range = 1 + MAX(U.Id) - MIN(U.Id)
FROM dbo.Users AS U;
The plan reads one row from each end of the index.
Now we generate 100 distinct random IDs in the range (with matching rows in the users table) and return those rows:
WITH Random (ID) AS
(
    -- Find @Rows distinct random user IDs that exist
    SELECT DISTINCT TOP (@Rows)
        Random.ID
    FROM dbo.Users AS U
    CROSS APPLY
    (
        -- Random ID
        VALUES (@MinID + (CONVERT(integer, CRYPT_GEN_RANDOM(4)) % @Range))
    ) AS Random (ID)
    WHERE EXISTS
    (
        SELECT 1
        FROM dbo.Users AS U2
            -- Ensure the row continues to exist
            WITH (REPEATABLEREAD)
        WHERE U2.Id = Random.ID
    )
)
SELECT
    U3.Id,
    U3.DisplayName,
    U3.CreationDate
FROM Random AS R
JOIN dbo.Users AS U3
    ON U3.Id = R.ID
-- QO model hint required to get a non-blocking flow distinct
OPTION (MAXDOP 1, USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
The plan shows that in this case 601 random numbers were needed to find 100 matching rows. It is pretty quick:
Table 'Users'. Scan count 1, logical reads 1937, physical reads 2, read-ahead reads 408
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0
 SQL Server Execution Times:
   CPU time = 0 ms, elapsed time = 9 ms.


Your Answer

Interviews

Parent Categories