Explain postgres update select with subquery referencing the main table?
I'm trying to understand how to filter my subquery in context to the main query. Ultimately I'm trying to get the MAX value from the latest record BEFORE the date of the current record. Here is where I'm at.
UPDATE Opphistory t
SET MaxStageSortOrder = sub.max_snapshotdate
FROM (
SELECT opportunityid, max(snapshotdate) AS max_snapshotdate
FROM Opp History
WHERE forecastcategory <> 'Omitted' and snapshotdate <= t.snapshotdate
GROUP BY 1
) sub
WHERE t.opportunityid = sub.opportunityid
It is the snapshotdate <= t.snapshotdate that appears to fail.
Seems like you were aiming for a correlated subquery:
UPDATE opphistory t
SET MaxStageSortOrder = (
SELECT max(snapshotdate)
FROM Opphistory t1
WHERE t1.opportunityid = t.opportunityid
AND t1.snapshotdate < t xss=removed> 'Omitted'
);
A derived table in the FROM clause of an UPDATE cannot reference columns of the main table. That's possible in a correlated subquery or a LATERAL subquery. But, unfortunately, table expressions in the FROM clause of an UPDATE are (at least up to pg 14) always joined with a CROSS JOIN (effectively) to the main table. You can work around this limitation by repeating the main table in the FROM clause, binding that one-to-one to the main table, and then joining to the "proxy" with any join type. In the case at hand, we don't even need that:
UPDATE Opphistory t
SET MaxStageSortOrder = sub.max_snapshotdate
FROM (
SELECT opphistory_id
, max(snapshotdate) OVER (PARTITION BY opportunityid ORDER BY snapshotdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS max_snapshotdate
FROM Opphistory
WHERE forecastcategory <> 'Omitted'
) sub
WHERE t.opphistory_id = sub.opphistory_id
AND t.MaxStageSortOrder IS DISTINCT FROM sub.max_snapshotdate;
opphistory_id is supposed to be the PRIMARY KEY of the table.
I expect the second query to be substantially faster for big tables, as it (probably) only needs a single sequential scan and a sort to compute the running max for all rows. Running a correlated subquery for every row (like in the first query) has its price.
About ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING:
Reduce results into accumulated groups
Note subtle differences:
The first query updates every row, no matter what.
The second query omits ...
... rows with forecastcategory = 'Omitted' and rows with forecastcategory IS NULL
(The first query only excludes those from the max computation.)
... rows with opportunityid IS NULL
... rows that would not change - due to the added last line.
If (opportunityid, snapshotdate) is not defined UNIQUE NOT NULL, we may have to do more, starting with a definition of how to deal with duplicates and NULL values.
It's not clear from your question what you want exactly regarding postgres update select with subquery referencing the main table.
If you really just want the snapshot date from the "previous" row, consider the simpler window function lag():
UPDATE Opphistory t
SET MaxStageSortOrder = sub.max_snapshotdate
FROM (
SELECT opphistory_id
, lag(snapshotdate) OVER (PARTITION BY opportunityid ORDER BY snapshotdate) AS max_snapshotdate
FROM Opphistory
WHERE forecastcategory <> 'Omitted'
) subWHERE t.opphistory_id = sub.opphistory_id
AND t.MaxStageSortOrder IS DISTINCT FROM sub.max_snapshotdate;