What is the update limit postgres?

441 Asked by AndreaBailey in SQL Server , Asked on Sep 29, 2022

I have a Postgres database which contains details on clusters of servers, such as server status ('active', 'standby' etc). Active servers at any time may need to fail over to a standby, and I don't care which standby is used in particular.

I want a database query to change the status of a standby - JUST ONE - and return the server IP that is to be used. The pick can be arbitrary: since the status of the server changes with the query, it doesn't matter which standby is selected.

Is it possible to limit my query to just one update?

Here is what I have so far:
UPDATE server_info SET status = 'active' 
WHERE status = 'standby' [[LIMIT 1???]] 
RETURNING server_ip;

Postgres doesn't like this. What could I do differently?

Answered by Amit Sinha

The update limit postgres is -

Without concurrent write access

Materialise a selection in a CTE (Common Table Expressions) and join to it in the FROM clause of the UPDATE.

WITH cte AS (

   SELECT server_ip          -- pk column or any (set of) unique column(s)

   FROM   server_info

   WHERE  status = 'standby'

   LIMIT  1                  -- arbitrary pick (cheapest)

   )

UPDATE server_info s

SET    status = 'active' 

FROM   cte

WHERE  s.server_ip = cte.server_ip

RETURNING s.server_ip;

I originally had a plain subquery here, but that can sidestep the LIMIT for certain query plans as Feike pointed out:

The planner may choose to generate a plan that executes a nested loop over the LIMITing subquery, causing more UPDATEs than LIMIT, e.g.:

 Update on buganalysis [...] rows=5

   ->  Nested Loop

         ->  Seq Scan on buganalysis

         ->  Subquery Scan on sub [...] loops=11

               ->  Limit [...] rows=2

                     ->  LockRows

                           ->  Sort

                                 ->  Seq Scan on buganalysis

Reproducing test case

The way to fix the above was to wrap the LIMIT subquery in its own CTE, as the CTE is materialised it will not return different results on different iterations of the nested loop.

Or use a lowly correlated subquery for the simple case with LIMIT 1. Simpler, faster:

UPDATE server_info

SET    status = 'active' 

WHERE  server_ip = (

         SELECT server_ip

         FROM   server_info

         WHERE  status = 'standby'

         LIMIT  1

         )

RETURNING server_ip;

With concurrent write access

Assuming default isolation level READ COMMITTED for all of this. Stricter isolation levels (REPEATABLE READ and SERIALIZABLE) may still result in serialisation errors. See:

  SELECT … FOR UPDATE SKIP LOCKED in REPEATABLE READ transactions

Under concurrent write load, add FOR UPDATE SKIP LOCKED to lock the row to avoid race conditions. SKIP LOCKED was added in Postgres 9.5, for older versions see below. The manual:

With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.

UPDATE server_info

SET    status = 'active' 

WHERE  server_ip = (

         SELECT server_ip

         FROM   server_info

         WHERE  status = 'standby'

         LIMIT  1

         FOR    UPDATE SKIP LOCKED

         )

RETURNING server_ip;

If there is no qualifying, unlocked row left, nothing happens in this query (no row is updated) and you get an empty result. For uncritical operations that means you are done. However, concurrent transactions may have locked rows, but then don't finish the update (ROLLBACK or other reasons). To be sure run a final check:

SELECT NOT EXISTS (

   SELECT FROM server_info

   WHERE  status = 'standby'

   );

  SELECT also sees locked rows. While that doesn't return true, one or more rows are still unfinished and transactions could still be rolled back. (Or new rows have been added meanwhile.) Wait a bit, then loop the two steps: (UPDATE till you get no row back; SELECT ...) until you get true.

Atomic UPDATE .. SELECT in Postgres

Without SKIP LOCKED in PostgreSQL 9.4 or older

UPDATE server_info

SET    status = 'active' 

WHERE  server_ip = (

         SELECT server_ip

         FROM   server_info

         WHERE  status = 'standby'

         LIMIT  1

         FOR    UPDATE

         )

RETURNING server_ip;

Concurrent transactions trying to lock the same row are blocked until the first one releases its lock.

If the first was rolled back, the next transaction takes the lock and proceeds normally; others in the queue keep waiting.

If the first committed, the WHERE condition is re-evaluated and if it's not TRUE any more (status has changed) the CTE (somewhat surprisingly) returns no row. Nothing happens. That's the desired behaviour when all transactions want to update the same row.

But not when each transaction wants to update the next row. And since we just want to update an arbitrary (or random) row, there is no point in waiting at all.

We can unblock the situation with the help of advisory locks:

UPDATE server_info

SET    status = 'active' 

WHERE  server_ip = (

         SELECT server_ip

         FROM   server_info

         WHERE  status = 'standby'

         AND    pg_try_advisory_xact_lock(id)

         LIMIT  1

         FOR    UPDATE

         )

RETURNING server_ip;

This way, the next row not locked yet will be updated. Each transaction gets a fresh row to work with. I had help from Czech Postgres Wiki for this trick.

id being any unique bigint column (or any type with an implicit cast like int4 or int2).

If advisory locks are in use for multiple tables in your database concurrently, disambiguate with pg_try_advisory_xact_lock(tableoid::int, id) - id being a unique integer here.

Since tableoid is a bigint quantity, it can theoretically overflow integers. If you are paranoid enough, use (tableoid::bigint !47483648)::int instead - leaving a theoretical "hash collision" for the truly paranoid ...

Also, Postgres is free to test WHERE conditions in any order. It could test pg_try_advisory_xact_lock() and acquire a lock before status = 'standby', which could result in additional advisory locks on unrelated rows, where status = 'standby' is not true. Related question on SO:

Postgres pg_try_advisory_lock blocks all records

Typically, you can just ignore this. To guarantee that only qualifying rows are locked, you could nest the predicate(s) in a CTE like above or a subquery with the OFFSET 0 hack (prevents inlining). Example:

  Put pg_try_advisory_xact_lock() in a nested subquery?

Or (cheaper for sequential scans) nest the conditions in a CASE statement like:

WHERE  CASE WHEN status = 'standby' THEN pg_try_advisory_xact_lock(id) END

However the CASE trick would also keep Postgres from using an index on status. If such an index is available, you don't need extra nesting to begin with: only qualifying rows will be locked in an index scan.

Since you cannot be sure that an index is used in every call, you could just:

WHERE  status = 'standby'

AND    CASE WHEN status = 'standby' THEN pg_try_advisory_xact_lock(id) END

The CASE is logically redundant, but it serves the discussed purpose.

If the command is part of a long transaction, consider session-level locks that can be (and have to be) released manually. So you can unlock as soon as you are done with the locked row: pg_try_advisory_lock() and pg_advisory_unlock(). The manual: Once acquired at session level, an advisory lock is held until explicitly released or the session ends.

What is the update limit postgres?

Your Answer