A transaction only succeeds if none of its reads are stale when the commit record is encountered

To optimize this process in cases where the view is small , the Corfuobject can create checkpoints and provide them to Corfu via a checkpoint call. Internally, Corfu stores these checkpoints on a separate shared log and accesses them when required on query_helper calls. Additionally, the object can forgo the ability to roll back before a checkpoint with a forget call, which allows Corfu to trim the log and reclaim storage capacity. The Corfu design enables other useful properties. Strongly consistent read throughput can be scaled simply by instantiating more views of the object on new clients. More reads translate into more check and read operations on the shared log, and scale linearly until the log is saturated. Additionally, objects with different in-memory data structures can share the same data on the log. For example, a name space can be represented by different trees, one ordered on the filename and the other on a directory hierarchy, allowing applications to perform two types of queries efficiently. We now substantiate our earlier claim that storing multiple objects on a single shared log enables strongly consistent operations across them without requiring complex distributed protocols. The Corfu runtime on each client can multiplex the log across objects by storing and checking a unique object ID on each entry; such a scheme has the drawback that every client has to play every entry in the shared log. For now, we make the assumption that each client hosts views for all objects in the system.

Later in the paper, we describe layered partitioning, which enables strongly consistent operations across objects without requiring each object to be hosted by each client,cultivo frambuesa and without requiring each client to consume the entire shared log. Many strongly consistent operations that are difficult to achieve in conventional distributed systems are trivial over a shared log. Applications can perform coordinated rollbacks or take consistent snapshots across many objects simply by creating views of each object synced up to the same offset in the shared log. This can be a key capability if a system has to be restored to an earlier state after a cascading corruption event. Another trivially achieved capability is remote mirroring; application state can be asynchronously mirrored to remote data centers by having a process at the remote site play the log and copy its contents. Since log order is maintained, the mirror is guaranteed to represent a consistent, system-wide snapshot of the primary at some point in the past. In Corfu, all these operations are implemented via simple appends and reads on the shared log. Corfu goes one step further and leverages the shared log to provide transactions within and across objects. It implements optimistic concurrency control by appending speculative transaction commit records to the shared log. Commit records ensure atomicity, since they determine a point in the persistent total ordering at which the changes that occur in a transaction can be made visible at all clients. To provide isolation, each commit record contains a read set: a list of objects read by the transaction along with their versions, where the version is simply the last offset in the shared log that modified the object.As a result, Corfu provides serializability with external consistency for transactions across objects.Corfu uses streams in an obvious way: each Corfu object is assigned its own dedicated stream.

If transactions never cross object boundaries, no further changes are required to the Corfu runtime. When transactions cross object boundaries, Corfu changes the behavior of its EndTX call to multiappend the commit record to all the streams involved in the write set. This scheme ensures two important properties required for atomicity and isolation. First, a transaction that affects multiple objects occupies a single position in the global ordering; in other words, there is only one commit record per transaction in the raw shared log. Second, a client hosting an object sees every transaction that impacts the object, even if it hosts no other objects. When a commit record is appended to multiple streams, each Corfu runtime can encounter it multiple times, once in each stream it plays. The first time it encounters the record at a position X, it plays all the streams involved until position X, ensuring that it has a consistent snapshot of all the objects touched by the transaction as of X. It then checks for read conflicts and determines the commit/abort decision. When each client does not host a view for every object in the system, writes or reads can involve objects that are not locally hosted at either the client that generates the commit record or the client that encounters it. We examine each of these cases: A. Remote writes at the generating client: The generating client – i.e., the client that executed the transaction and created the commit record – may want to write to a remote object. This case is easy to handle; as we describe later, a client does not need to play a stream to append to it, and hence the generating client can simply append the commit record to the stream of the remote object. B. Remote writes at the consuming client: A client may encounter commit records generated by other clients that involve writes to objects it does not host; in this case, it simply updates its local objects while ignoring updates to the remote objects. Remote-write transactions are an important capability.

Applications that partition their state across multiple objects can now consistently move items from one partition to the other. In our evaluation, we implement Apache ZooKeeper as a Corfu object, create a partitioned name space by running multiple instances of it,maceta cuadrada and move keys from one name space to the other using remote-write transactions. Another example is a producer consumer queue; with remote-write transactions, the producer can add new items to the queue without having to locally host it and see all its updates. C. Remote reads at the consuming client: Here, a client encounters commit records generated by other clients that involve reads to objects it does not host; in this case, it does not have the information required to make a commit/abort decision since it has no local copy of the object to check the read version against. To resolve this problem, we add an extra round to the conflict resolution process, as shown in Figure 5.3. The client that generates and appends the commit record immediately plays the log forward until the commit point, makes a commit/abort decision for the record it just appended, and then appends an extra decision record to the same set of streams. Other clients that encounter the commit record but do not have locally hosted copies of the objects involved can now wait for this decision record to arrive. Since decision records are only needed for this particular case, the Corfu object interface described in Section 5.1 is extended with an is Shared function, which is invoked by the Tango runtime and must return true if decision records are required. Significantly, the extra phase adds latency to the transaction but does not increase the abort rate, since the conflict window for the transaction is still the span in the shared log between the reads and the commit record. D. Remote reads at the generating client: Corfu does not currently allow a client to execute transactions and generate commit records involving remote reads. Calling an accessor on an object that does not have a local view is problematic, since the data does not exist locally; possible solutions involve invoking an RPC to a different client with a view of the object, if one exists, or recreating the view locally at the beginning of the transaction, which can be too expensive.

If we do issue RPCs to other clients, conflict resolution becomes problematic; the node that generated the commit record does not have local views of the objects read by it and hence cannot check their latest versions to find read-write conflicts. As a result, conflict resolution requires a more complex, collaborative protocol involving multiple clients sharing partial, local commit/abort decisions via the shared log; we plan to explore this as future work. A second limitation is that a single transaction can only write to a fixed number of Corfu objects. The multiappend call places a limit on the number of streams to which a single entry can be appended. As we will see in the next section, this limit is set at deployment time and translates to storage overhead within each log entry, with each extra stream requiring 12 to 20 bytes of space in a 1KB to 4KB log entry.The decision record mechanism described above adds a new failure mode to Tango: a client can crash after appending a commit record but before appending the corresponding decision record. A key point to note, however, is that the extra decision phase is merely an optimization; the shared log already contains all the information required to make the commit/abort decision. Any other client that hosts the read set of the transaction can insert a decision record after a time-out if it encounters an orphaned commit record. If no such client exists and a larger time-out expires, any client in the system can reconstruct local views of each object in the read set synced up to the commit offset and then check for conflicts. vCorfu presents itself as an object store to applications. Developers interact with objects stored in vCorfu and a client library, which we refer to as the vCorfu runtime, provides consistency and durability by manipulating and appending to the vCorfu stream store. Today, the vCorfu runtime supports Java, but we envision supporting many other languages in the future. The vCorfu runtime is inspired by the Tango runtime, which provides a similar distributed object abstraction in C++. On top of the features provided by Tango, such as linearizable reads and transactions, vCorfu leverages Java language features which greatly simplify writing vCorfu objects. Developers may store arbitrary Java objects in vCorfu, we only require that the developer provide a serialization method and to annotate the object to indicate which methods read or mutate the object, as shown in Figure 5.4. Like Tango, vCorfu fully supports transactions over objects with stronger semantics than most distributed data stores, thanks to inexpensive global snapshots provided by the log. In addition, vCorfu also supports transactions involving objects not in the run time’s local memory , opacity, which ensures that transactions never observe inconsistent state, and read-own-writes which greatly simplifies concurrent programming. Unlike Tango, the vCorfu runtime never needs to resolve whether transactional entries in the log have succeeded thanks to a lightweight transaction mechanism provided by the sequencer.Each object can be referred to by the id of the stream it is stored in. Stream ids are 128 bits, and we provide a standardized hash function so that objects can be stored using human-readable strings. vCorfu clients call open with the stream id and an object type to obtain a view of that object. The client also specifies whether the view should be local, which means that the object state is stored in-memory locally, or remote, which means that the stream replica will store the state and apply updates remotely. Local views are similar to objects in Tango and especially powerful when the client will read an object frequently throughout the lifespan of a view: if the object has not changed, the runtime only performs a quick check call to verify no other client has modified the object, and if it has, the runtime applies the relevant updates. Remote views, on the other hand, are useful when accesses are infrequent, the state of the object is large, or when there are many remote updates to the object – instead of having to playback and store the state of the object in-memory, the runtime simply delegates to the stream replica, which services the request with the same consistency as a local view. To ensure that it can rapidly service requests, the stream replicas generate periodic checkpoints. Finally, the client can optionally specify a maximum position to open the view to, which enables the client to access the history, version or snapshot of an object.