T-SQL - What's the most efficient way to loop through a table until a condition is met?
In got a programming task in the area of T-SQL. Task: People want to get inside an elevator every person has a certain weight. The order of the people waiting in line is determined by the column turn. The elevator has a max capacity of <= 1000 lbs. Return the last person's name that is able to enter the elevator before it gets too heavy! Return type should be table
Question: What is the most efficient way to solve this problem? If looping is correct is there any room for improvement?
I used a loop and # temp tables, here my solution:
set rowcount 0 -- THE SOURCE TABLE "LINE" HAS THE SAME SCHEMA AS #RESULT AND #TEMP use Northwind go declare @sum int declare @curr int set @sum = 0 declare @id int IF OBJECT_ID('tempdb..#temp','u') IS NOT NULL DROP TABLE #temp IF OBJECT_ID('tempdb..#result','u') IS NOT NULL DROP TABLE #result create table #result( id int not null, [name] varchar(255) not null, weight int not null, turn int not null ) create table #temp( id int not null, [name] varchar(255) not null, weight int not null, turn int not null ) INSERT into #temp SELECT * FROM line order by turn WHILE EXISTS (SELECT 1 FROM #temp) BEGIN -- Get the top record SELECT TOP 1 @curr = r.weight FROM #temp r order by turn SELECT TOP 1 @id = r.id FROM #temp r order by turn --print @curr print @sum IF(@sum + @curr <= 1000) BEGIN print 'entering........ again' --print @curr set @sum = @sum + @curr --print @sum INSERT INTO #result SELECT * FROM #temp where [id] = @id --id, [name], turn DELETE FROM #temp WHERE id = @id END ELSE BEGIN print 'breaking.-----' BREAK END END SELECT TOP 1 [name] FROM #result r order by r.turn desc
Here the Create script for the table I used Northwind for testing:
USE [Northwind] GO /****** Object: Table [dbo].[line] Script Date: 28.05.2018 21:56:18 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[line]( [id] [int] NOT NULL, [name] [varchar](255) NOT NULL, [weight] [int] NOT NULL, [turn] [int] NOT NULL, PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY], UNIQUE NONCLUSTERED ( [turn] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[line] WITH CHECK ADD CHECK (([weight]>(0))) GO INSERT INTO [dbo].[line] ([id], [name], [weight], [turn]) VALUES (5, 'gary', 800, 1), (3, 'jo', 350, 2), (6, 'thomas', 400, 3), (2, 'will', 200, 4), (4, 'mark', 175, 5), (1, 'james', 100, 6) ;
You should try to avoid sql server loop through tables generally. They are normally less efficient than set based solutions as well as less readable.
The below should be pretty efficient. Even more so if the name and weight columns could be INCLUDE-d in the index to avoid the key lookups. It can scan the unique index in order of turn and calculate the running total of the Weight column - then use LEAD with the same ordering criteria to see what the running total in the next row will be. As soon as it finds the first row where this exceeds 1000 or is NULL (indicating there is no next row) then it can stop the scan.
WITH T1 AS (SELECT *, SUM(Weight) OVER (ORDER BY turn ROWS UNBOUNDED PRECEDING) AS cume_weight FROM [dbo].[line]), T2 AS (SELECT LEAD(cume_weight) OVER (ORDER BY turn) AS next_cume_weight, * FROM T1) SELECT TOP 1 name FROM T2 WHERE next_cume_weight > 1000 OR next_cume_weight IS NULL ORDER BY turn
Execution Plan
In practice it seems to read a few rows ahead of where is strictly necessary - it looks like each window spool/stream aggregate pair causes two additional rows to be read. For the sample data in the question ideally it would only need to read two rows from the index scan but in reality it reads 6 but this is not a significant efficiency issue and it does not degrade as more rows are added to the table For those interested in this issue an image with the rows output by each operator (as shown by the query_trace_column_values extended event) is below, the rows are output in row_id order (starting at 47 for the first row read by the index scan and finishing at 113 for the TOP) Click the image below to make it larger or alternatively see the animated version to make the flow easier to follow. Pausing the animation at the point where the Right hand stream aggregate has emitted its first row (for gary - turn = 1). It seems apparent that it was waiting to receive its first row with a different WindowCount (for Jo - turn = 2). And the window spool doesn't release the first "Jo" row until it has read the next row with a different turn (for thomas - turn = 3) So the window spool and stream aggregate both cause an additional row to be read and there are four of these in the plan - hence 4 additional rows. An explanation of the columns shown in the above follows:
- NodeName: Index Scan, NodeId: 15, ColumnName: id base table column covered by index
- NodeName: Index Scan, NodeId: 15, ColumnName: turn base table column covered by index
- NodeName: Clustered Index Seek, NodeId: 17, ColumnName: weight base table column retrieved from lookup
- NodeName: Clustered Index Seek, NodeId: 17, ColumnName: name base table column retrieved from lookup
- NodeName: Segment, NodeId: 13, ColumnName: Segment1010 Returns 1 at start of new group or null otherwise. As no Partition By in the SUM only the first row gets 1
- NodeName: Sequence Project, NodeId: 12, ColumnName: RowNumber1009 row_number() within the group indicated by the Segment1010 flag. As all rows are in the same group this is ascending integers from 1 to 6. Would be used for filtering right frame rows in cases like rows between 5 preceding and 2 following. (or as for LEAD later)
- NodeName: Segment, NodeId: 11, ColumnName: Segment1011 Returns 1 at start of new group or null otherwise. As no Partition By in the SUM only the first row gets 1 (Same as Segment1010)
- NodeName: Window Spool, NodeId: 10, ColumnName: WindowCount1012 Attribute that groups together rows belonging to a window frame. This window spool is using the "fast track" case for UNBOUNDED PRECEDING. Where it emits two rows per source row. One with the cumulative values and one with the detail values. Though there is no visible difference in the rows exposed by query_trace_column_values I assume that cumulative columns are there in reality.
- NodeName: Stream Aggregate, NodeId: 9, ColumnName: Expr1004 Count(*) grouped by WindowCount1012 according to plan but actually a running count
- NodeName: Stream Aggregate, NodeId: 9, ColumnName: Expr1005 SUM(weight) grouped by WindowCount1012 according to plan but actually the running sum of weight (i.e. cume_weight)
- NodeName: Segment, NodeId: 7, ColumnName: Expr1002 CASE WHEN [Expr1004]=(0) THEN NULL ELSE [Expr1005] END - Don't see how COUNT(*) can be 0 so will always be running sum (cume_weight)
- NodeName: Segment, NodeId: 7, ColumnName: Segment1013 No partition by on the LED so the first row gets 1. All remaining get null
- NodeName: Sequence Project, NodeId: 6, ColumnName: RowNumber1006 row_number() within the group indicated by the Segment1013 flag. As all rows are in the same group this is ascending integers from 1 to 4
- NodeName: Segment, NodeId: 4, ColumnName: BottomRowNumber1008 RowNumber1006 + 1 as the LEAD requires the single next row
- NodeName: Segment, NodeId: 4, ColumnName: TopRowNumber1007 RowNumber1006 + 1 as the LEAD requires the single next row
- NodeName: Segment, NodeId: 4, ColumnName: Segment1014 No partition by on the LEAD so the first row gets 1. All remaining get null
- NodeName: Window Spool, NodeId: 3, ColumnName: WindowCount1015 Attribute that groups together rows belonging to a window frame using the previous row numbers. The window frame for LEAD has max 2 rows (the current one and the next one)
- NodeName: Stream Aggregate, NodeId: 2, ColumnName: Expr1003 LAST_VALUE([Expr1002]) for the LEAD(cume_weight)
Hope this helps!