ACID properties across partitioned Prevayler instances

Justin T. Sampson

First revision June 12, 2004; updated June 13, 2004.

Introduction

Operations executed by a single Prevayler instance are known to satisfy ACID properties. To handle an amount of data beyond a single machine's capabilities, multiple Prevayler instances may be configured on multiple machines. This note discusses maintaining ACID properties during operations on multiple Prevayler instances without implementing a central transaction manager.

ACID consists of the four properties of atomicity, consistency, isolation, and durability:

Atomicity means that the effects of any operation are either applied fully or not at all.

Consistency means that any logical constraints on the data are preserved by every operation.

Isolation means that no operation will see an inconsistent state caused by the concurrent execution of any other operation.

Durability means that once the success of an operation has been confirmed, its effects are guaranteed to not be lost (for some appropriate definition of "guaranteed", such as being on disk rather than only in memory).

Within a single Prevayler instance, atomicity is provided by any of several mechanisms: a "royal food taster" is provided to speculatively execute each update operation, allowing rollback of the entire prevalent system if a runtime exception is thrown; and an operation itself can be written to check preconditions before making changes as much as possible, or to maintain the ability to selectively undo its own changes. Consistency is the responsibility of the application logic within each operation, aided by the other ACID properties. Isolation is provided by executing all operations in strict sequence. Finally, durability is provided by journalling all update operations before execution.

Definitions

The following terms will be used throughout the following section:

A partition is a single Prevayler instance holding a portion of the application's data, not overlapping with any other partition. (Each partition may further be replicated to enhance fail-safety and increase query throughput, but such replication is not relevant to this discussion.)

A micro-operation is a single operation on a single partition.

A micro-query is a single query operation on a single partition, guaranteed to cause no changes to the data in that partition.

A micro-update is a single update operation on a single partition, which may cause changes to the data in that partition.

A macro-operation is a sequence of zero or more micro-operations, along with further processing based on the results of those micro-operations, corresponding to a single client request.

A macro-query is a macro-operation comprising zero or more micro-queries.

A macro-update is a macro-operation comprising zero or more micro-queries and zero or more micro-updates.

Design constraints to achieve ACID properties

The challenge now is to develop a set of design constraints that will achieve ACID properties for macro-operations, given that ACID is already satisfied for each micro-operation.

To achieve atomicity, we must guarantee that every macro-update contains at most one micro-update. Since a single micro-update is known to be atomic, if all of a macro-update's changes are confined to a single micro-update, then the macro-update is also atomic. If a macro-update contains more than one micro-update, then the changes from the first micro-update would be visible without the changes from the second micro-update, which would violate atomicity.

To achieve consistency, we must guarantee that each logical constraint is confined to a single partition. Since a single micro-operation is known to maintain consistency within its partition, any sequence of micro-operations will also maintain consistency within each partition, so that as long as there are no logical constraints spanning partitions, total system consistency is maintained as well.

To achieve isolation, we must guarantee that every macro-operation contains at most one micro-operation for any given partition, in addition to the preceding guarantee against inter-partition logical constraints. While each individual micro-operation is known to maintain consistency, two micro-operations on the same partition may not appear consistent if there is an intervening update to the same partition. However, if there are no inter-partition logical constraints, two micro-operations on different partitions can never appear inconsistent.

Durability is provided by each partition for the micro-operations it executes, and therefore the effects of any macro-operation are also durable.

In summary, we can achieve ACID properties across partitioned Prevayler instances by applying three design constraints affecting how we partition our application data and how we construct our micro- and macro-operations:

  1. Every macro-update must contain at most one micro-update.
  2. Each logical constraint must be confined to a single partition.
  3. Every macro-operation must contain at most one micro-operation for any given partition.

It is possible to define intra-partition logical constraints such that certain inter-partition logical constraints can be deduced, without violating the second design constraint. For example, consider a desired foreign-key constraint across two partitions. Objects of type A reside in one partition, each having a unique primary key. Objects of type B reside in a second partition, each holding a reference to an object of type A as a foreign key.

This inter-partition constraint can be accomplished by composing two intra-partition constraints along with suitable ordering of micro-operations. Suppose that it is practical to constrain objects of type A to never be deleted. Once created with a given primary key, an object of type A will never cease to exist, and so its primary key will always be valid. This is an intra-partition constraint solely on the first partition.

Now, we can introduce a constraint on the second partition that every reference to an object of type A must have been valid at some point in the past. Since such a constraint, once satisfied, cannot be invalidated by any changes to the first partition, its maintenance is therefore confined to the second partition, satisfying the design constraint.

Finally, to accomplish the desired inter-partition constraint, we must order micro-operations in the intuitive way. When creating an object of type B, we first issue a micro-query to the first partition to retrieve an appropriate key. We then issue a micro-update to the second partition containing that key, which is now known to have been valid at some point in the past. Therefore the new object of type B in the second partition is created with its constraint satisfied.

When querying for objects of type B and their associated objects of type A, we must first issue a micro-query to the second partition and only then issue a micro-query to the first partition with the desired keys. Due to the second partition's constraint, any key retrieved from an object of type B is known to have been valid at some point in the past; and due to the first partition's constraint, any key that was valid at some point in the past will continue to be valid forever.

If, instead, we were to query the first partition for a collection of objects of type A, and only then query the second partition for a collection of objects of type B, the intra-partition constraints would not be sufficient to guarantee that the references held by those B objects would be valid within the collection of queried A objects. Each reference is known to have been valid at some point in the past, but that is not strong enough to know whether the preceding query on the first partition was earlier or later than that point.

Comparisons

The definition of ACID given here is actually stronger than some definitions. The isolation property is often defined as simply requiring that one operation's effects are not seen by other operations until the first operation has completed successfully. This does not imply that every state seen by an operation is consistent with the system's constraints, because if one operation queries some data before and after another operation updates the same data, the two queries may be inconsistent with each other even though the update was only seen once it had been committed.

Strict serializability requires that the effects and results of all operations are equivalent to those that would occur if the operations followed strictly one after another. Both the common weaker ACID definition and the stronger ACID definition given here are weaker than strict serializability.

Consider the following sequence of SQL statements, executed in four concurrent transactions, T1–T4, where table A contains a single integer column X and table B contains a single integer column Y, and both tables initially contain a single row with value 0:

Assume that T2 and T4 commit their updates immediately. Then T1 will see A.X=0 and B.Y=1, while T3 will see A.X=1 and B.Y=0. This outcome is allowed by both definitions of ACID. Each query sees only committed changes, and since there are no constraints between the tables, either order is consistent. However, if the updates were strictly serialized, then the state A.X=0 and B.Y=1 could never have been seen. Therefore both definitions of ACID are weaker than strict serializability because they allow a transaction history that strict serializability would disallow.

Now, consider the following sequence of SQL statements, executed in four concurrent transactions, T1–T4, where table A contains a single integer column X and table B contains a single integer column Y, and both tables initially contain a single row with value 0, and there is a foreign-key constraint from B to A such that every value in B must occur in A:

Again, T2 and T4 commit their inserts immediately. Since T4's insert occurs after T2's, it does not violate the foreign-key constraint. T3 sees A containing two rows, with X=0 and X=1, and sees B containing one row, with Y=0, which is also consistent with the constraint. However, T1 sees A containing only one row with X=0 but sees B containing two rows, with Y=0 and Y=1, which does violate the constraint. Since T1 sees a constraint violated, this history violates the definition of ACID in terms of consistent reads, while it still does not violate the definition of ACID in terms of committed changes, because no uncommitted changes were seen. Therefore the definition of ACID in terms of uncommitted changes is weaker than the definition of ACID in terms of consistent reads given here.

It is interesting to compare these definitions of ACID properties to the transaction isolation levels defined for SQL. The 1992 ANSI standard was imprecise, allowing two interpretations of the isolation levels, one leading to incorrect behavior and the other being too restrictive. Atul Adya summarized the discussion in his Ph.D. thesis, along with proposals for better definitions. The definition of ACID in terms of committed changes is equivalent to the "preventative" interpretation of SQL's "read committed" isolation level, and therefore "read committed" isolation is weaker than the definition of ACID in terms of consistent reads given here. Note that in both example histories, each transaction only references each table once, so there are no nonrepeatable or phantom reads. The "anomaly" interpretation of SQL's "serializable" isolation level is that it merely prevents nonrepeatable and phantom reads, which both histories satisfy. Therefore the "anomaly" interpretation of the SQL's "serializable" isolation level is weaker than the definition of ACID in terms of consistent reads given here, and is also therefore weaker than true strict serializability. The "preventative" interpretation of SQL's "repeatable read" isolation level would forbid the first example history above, but not the second, and is therefore neither strictly stronger nor strictly weaker than the definition of ACID in terms of consistent reads. The "preventative" interpretation of SQL's "serializable" isolation level would forbid both, and is in fact equivalent to true strict serializability.