Desktop version

Home arrow Computer Science arrow Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems

Characterizing write skew

This anomaly is called write skew [28]. It is neither a dirty write nor a lost update, because the two transactions are updating two different objects (Alice’s and Bob’s on- call records, respectively). It is less obvious that a conflict occurred here, but it’s definitely a race condition: if the two transactions had run one after another, the second doctor would have been prevented from going off call. The anomalous behavior was only possible because the transactions ran concurrently.

You can think of write skew as a generalization of the lost update problem. Write skew can occur if two transactions read the same objects, and then update some of those objects (different transactions may update different objects). In the special case where different transactions update the same object, you get a dirty write or lost update anomaly (depending on the timing).

We saw that there are various different ways of preventing lost updates. With write skew, our options are more restricted:

  • • Atomic single-object operations don’t help, as multiple objects are involved.
  • • The automatic detection of lost updates that you find in some implementations of snapshot isolation unfortunately doesn’t help either: write skew is not automatically detected in PostgreSQL’s repeatable read, MySQL/InnoDB’s repeatable read, Oracle’s serializable, or SQL Server’s snapshot isolation level [23]. Automatically preventing write skew requires true serializable isolation (see “Serializa- bility” on page 251).
  • • Some databases allow you to configure constraints, which are then enforced by the database (e.g., uniqueness, foreign key constraints, or restrictions on a particular value). However, in order to specify that at least one doctor must be on call, you would need a constraint that involves multiple objects. Most databases do not have built-in support for such constraints, but you may be able to implement them with triggers or materialized views, depending on the database [42].
  • • If you can’t use a serializable isolation level, the second-best option in this case is probably to explicitly lock the rows that the transaction depends on. In the doctors example, you could write something like the following:

BEGIN TRANSACTION;

SELECT * FROM doctors WHERE on_catl = true AND shift_id = 1234 FOR UPDATE; О

UPDATE doctors

SET on_call = false WHERE name = 'Alice'

AND shift_id = 1234;

COMMIT;

О As before, FOR UPDATE tells the database to lock all rows returned by this

query.

 
Source
< Prev   CONTENTS   Source   Next >

Related topics