Category Archives: distributed systems

Google’s lesson learned: distributed transactions matter

In retrospect I think that [not supporting distributed transactions] was a mistake. We probably should have added that in because what ended up happening is a lot of people did want distributed transactions, and so they hand-rolled their own protocols, sometimes incorrectly, and it would have been better to build it into the infrastructure. So in Spanner we do have distributed transactions. We don’t have a lot of experience with it yet.

–Jeff Dean, in response to “What was the biggest mistake?” at the end of his Stanford EE380 lecture (note: video has since been replaced with that of another lecture)

Infrequently asked questions on deterministic distributed transaction management

Updated since my original post: added plug; explained why centralization is still hard; noted the issues unaddressed by this work; clarified that Daniel is focusing on ACID.

My homeboy Daniel Abadi baits flames in true Stonebraker style in his latest blog post, proclaiming: “the NoSQL decision to give up on ACID is the lazy solution to these scalability and replication issues.” Then he shows you how he promises to pimp your database by imposing a deterministic serial ordering to all transactions.

Without further ado, it’s time for some Infrequently Asked Questions On Deterministic Distributed Transaction Management!

I’m busy. Gimme the executive summary.

Daniel introduces his new VLDB paper, where he says he’s got the distributed DBMS your distributed DBMS could smell like, if only you used a centralized, deterministic transaction manager instead of lady-scented body wash. Specifically, he proposes avoiding the dreaded two-phase commit by centrally controlling concurrency, and minimizing lock time with optimistic operation.

I react.

Continue reading

Protocol Buffers vs. Thrift

I’ve been playing around a bit with Protocol Buffers and Thrift. Thrift has many more features:

  • generates RPC service implementations (PB only generates the interfaces); currently targets libevent
  • targets more languages
  • constants
  • exceptions
  • multiple protocols (binary, JSON)
  • asynchronous procedures
  • more collection types, such as maps/sets; this makes it easier to use the messages directly as the primary representations of your program’s data

Despite Thrift’s additional features, for the small project I’m currently working on, I’m going with PB. Some reasons:

  • (at least for C++) the interface to reading/writing messages feels “lighter”: you don’t need to manually contsruct transports/protocols/etc., heap-allocate and wrap things in shared_ptrs, perform copies to get at the data inside a TMemoryBuffer (unless you write your own TTransport), and so on.
  • PB is a more mature tool than Thrift
  • documentation
  • not as fast or as compact as PB
  • weaker encapsulation: it exposes public fields and uses the language’s standard library containers, which precludes tricks such as backing messages directly with the serialization buffer
  • small annoyances, such as not having packages for and not building cleanly on Ubuntu 8.10