Position: Ph.D. student

Current Institution: University of Washington

Automating Data Management and Storage for Reactive, Wide-area Applications with Diamond

Users of today’s popular wide-area apps (e.g., Twitter, Google Docs, and Words with Friends) no longer save and reload when updating shared data; instead, these applications are reactive, providing the illusion of continuous synchronization across mobile devices and the cloud. Maintaining this illusion presents a challenging distributed data management problem for application programmers. Modern reactive applications consist of widely distributed processes sharing data across mobile devices an cloud servers. These processes make concurrent data updates, can stop or fail at any time, and may be connected by slow or unreliable links. While distributed storage systems can provide persistence and availability, programmers still face the formidable challenge of synchronizing updates between application processes and distributed storage in a way that is fault-tolerant and consistent in a wide-area environment.

This talk presents Diamond, the first reactive data management service for wide-area applications. Diamond performs the following functions on behalf of the application: (1) it ensures that updates to shared data are consistent and durable, (2) it reliably coordinates and synchronizes shared data updates across processes, and (3) it automatically triggers reactive code when shared data changes so that processes can perform appropriate tasks. For example, when a user makes an update from one device (e.g., a move in a multi-player game), Diamond persists the update, reliably propagates it to other users’ devices, and transparently triggers application code on those devices to react to the changes.

Reactive data management in the wide-area context requires delicate balancing; thus, Diamond implements the difficult mechanisms required by these applications (such as logging and concurrency control), while allowing programmers to focus on high-level data-sharing requirements (e.g. atomicity, concurrency, and data layout). Diamond introduces three new concepts:

* Reactive Data Map (rmap), a primitive that lets applications create reactive data types — shared, persistent in-memory data structures –and map them into Diamond so it can automatically synchronize them across distributed processes and persistent storage.

* Reactive Transactions, an interactive transaction type that automatically re-executes in response to shared data updates. Unlike materialized views, or database triggers, these “live” transactions run application code to perform local, application-specific functions (e.g., UI changes).

* Data-type Optimistic Concurrency Control (DOCC), a concurrency control mechanism that leverages data-type semantics to concurrently commit transactions executing commutative operations (e.g., writes to different list elements, increments to a counter). Our experiments show that DOCC is critical to coping with wide-area latencies, reducing abort rates by up to 5x.

We designed and implemented a Diamond prototype in C++ with language bindings for C++, Python and Java on both x86 and Android platforms. To evaluate Diamond, we built and measured both Diamond versions and custom versions (using explicit data management) of four reactive apps. Our experiments show that Diamond significantly reduces the complexity and size of reactive applications, provides strong transactional guarantees that eliminate common data races, and supports automatic reactivity with performance close to custom-written reative apps.


I am a fourth year PhD student, working with Hank Levy and Arvind Krishnamurthy in the Computer Systems Lab at the University of Washington. My PhD research focuses on distributed systems for large-scale applications with two main directions: (1) distributed programming platforms for mobile-cloud applications and (2) high-performance transactional storage for datacenter applications.

Sapphire [1] is a new distributed programming platform that provides customizable and
extensible deployment of mobile/cloud applications. Sapphire’s key design feature is its
distributed runtime system, which supports a flexible and extensible deployment layer for solving complex distributed systems tasks, such as fault-tolerance, code-offloading, and caching. Rather than writing distributed systems code, programmers choose deployment managers that extend Sapphire’s kernel to meet their applications’ deployment requirements. In this way, each application runs on an underlying platform that is customized for its own distribution needs.

TAPIR [2] is a new protocol for distributed transactional storage systems that enforces a linearizable transaction ordering using a replication protocol with no ordering at all. The key insight behind TAPIR is that existing transactional storage systems waste work by layering a strong transaction protocol on top of a strong replication protocol. Instead, we designed inconsistent replication (IR), the first replication protocol to provide fault-tolerance with no consistency guarantees. TAPIR- the Transactional Application Protocol for Inconsistent Replication – provides linearizable transactions using IR. By enforcing strong consistency only in the transaction protocol, TAPIR can commit transactions in a single round-trip and order distributed transactions without centralized coordination.

Before starting my PhD, I worked for three years at VMware in the virtual machine monitor group on virtual machine checkpointing. My work on Halite [3] used working set estimation to improve the performance of restoring virtual machines on the VMware hypervisor.

I received my S.B. in computer science from MIT in 2008 and my M. Eng. In 2009. For my M. Eng., I worked with Fraus Kaashoek and Robert Morris on flexible, wide-area distributed storage system [4].

[1] Customizable and Extensible Deployment for Mobile/Cloud Applications. I. Zhang, A. Szekeres, D. Van Aken, I. Ackerman, S.D. Gribble, A. Krishnamurthy, and H. M. Levy. Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). October 2014

[2] Building Consistent Transactions with Inconsistent Replication. I. Zhang, N.K. Sharma, A. Szekeres, A. Krishnamurhty, and D.R. K. Ports. Proceedings of the ACM Symposium on Operating Systems Principles (SOP). October 2015.

[3] Optimizing VM Checkpointing for Restore Performance in VMware ESXi. I. Zhang, T. Denniston, Y. Baskakov, and A. Garthwaite. Proceedings of the USENIX Annual Technical Conference (ATC). June 2013.

[4] Flexible, Wide-Area Storage for Distributed Systems with WheelFS. J. Stribling, Y. Sovran, I. Zhang, X. Pretzer, J. Li, M.F. Kaashoek, and R. Morris. Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). April 2009.