Optimistic Concurrency Control

Transaction Refresher

Statement of problem:

 

A transaction schedule:

T0 T1
read(A)
A = A - 50
write(A)
read(A)
temp = A*0.1
A = A - temp
write(A)
read(B)
B = B + 50
write(B)
read(B)
B = B + temp
write(B)

 

The system "understands" only reads and writes; cannot assume any semantics of other operations.

Arbitrary interleaving can lead to:

 

Some definitions:

  1. The set of transactions that participate in S1 and S2 are the same.
  2. For each data item Q in S1, if transaction Ti executes read(Q) and the value of Q read by Ti was written by Tj, then the same will hold in S2. [reads are all the same]
  3. For each data item Q in S1, if transaction Ti executes the last write(Q) instruction, then the same holds in S2. [the same writers "win"]

 

One way to think about concurrency control – in terms of dependencies:

  1. T1 reads N ... T2 writes N: a RW dependency
  2. T1 writes N ... T2 reads N: a WR dependency
  3. T1 writes N ... T2 writes N: a WW dependency

Can construct a "serialization graph" for a schedule S (SG(S)):

Theorem: A schedule S is serializable iff SG(S) is acyclic.
 

Locking

A technique to ensure serializability, but hopefully preserve high concurrency as well. The winner in industry.
Basics:

Multiple lock modes: Some data items can be shared, so not all locks need to be exclusive.
Lock compatibility table 1:
Assume two lock modes: shared (S) and exclusive (X) locks.

  S X
S T F
X F F

If you request a lock in a mode incompatible with an existing lock, you must wait. 

Two-Phase Locking (2PL):

Theorem: If all xacts are well-formed and follow 2PL, then any resulting schedule is serializable
(note: this is if, not if and only if!)
 

Some implementation background

Well, starvation, unfair, etc. Could do a little of that, but not much. So...

Kung & Robinson

Attractive, simple idea: optimize case where conflict is rare.

Basic idea: all transactions consist of three phases:

  1. Read. Here, all writes are to private storage (shadow copies).
  2. Validation. Make sure no conflicts have occurred.
  3. Write. If Validation was successful, make writes public. (If not, abort!)

When might this make sense? Three examples:

  1. All transactions are readers.
  2. Lots of transactions, each accessing/modifying only a small amount of data, large total amount of data.
  3. Fraction of transaction execution in which conflicts "really take place" is small compared to total pathlength.

The Validation Phase

  1. Assign each transaction a TN during execution.
  2. Ensure that if you run transactions in order induced by "<" on TNs, you get an equivalent serial schedule.

Suppose TN(Ti) < TN(Tj). Then if one of the following three conditions holds, it’s serializable:

    1. Ti completes its write phase before Tj starts its read phase.
    2. WS(Ti) ∩ RS(Tj) = ∅ and Ti completes its write phase before Tj starts its write phase.
    3. WS(Ti) ∩ RS(Tj) = ∅ and WS(Ti) ∩ WS(Tj) = ∅ and Ti completes its read phase before Tj completes its read phase.

Is this correct? Each condition guarantees that the three possible classes of conflicts (W-R, R-W, W-W) go "one way" only: higher transaction id (j) depends on lower (i), but not vice versa. (When we speak of conflicts below, they are implicitly ordered i then j.)

  1. For condition 1 this is obvious (true serial execution!)
  2. Interleave Wi/Rj, but ensure no WR conflict. RW and forward WW allowed.
  3. interleave Wi/Rj or Wi/Wj. I.e. forbid only Ri/Wj interleaving, AND ensure all conflicts are RW

Assigning TN's: at beginning of transactions is not optimistic; do it at end of read phase. Note: this satisfies second half of condition (3).

Note: a transaction T with a very long read phase must check write sets of all transactions begun and finished while T was active.  This could require unbounded buffer space.
Solution: bound buffer space, toss out when full, abort transactions that could be affected.

Serial Validation

Only checks properties (1) and (2), since writes are not going to be interleaved.

Simple technique: make a critical section around <get xactno; validate (1) or (2) for everybody from your start to finish; write>. Not great if:

Improvement to speed up validation:

repeat as often as you want {

}

<get xactno; validate with new xacts; write>.

Note: read-only xacts don’t need to get xactnos! Just need to validate up to highest xactno at end of read phase (without critical section!)

 
 

Parallel Validation

Want to allow interleaved writes.
Need to be able to check condition (3).

Small critical section.
Problems:

© 1998, Joseph M. Hellerstein.  Last modified 08/18/98.
Feedback welcomed.