Category Archives: research

Infrequently asked questions on deterministic distributed transaction management

Updated since my original post: added plug; explained why centralization is still hard; noted the issues unaddressed by this work; clarified that Daniel is focusing on ACID.

My homeboy Daniel Abadi baits flames in true Stonebraker style in his latest blog post, proclaiming: “the NoSQL decision to give up on ACID is the lazy solution to these scalability and replication issues.” Then he shows you how he promises to pimp your database by imposing a deterministic serial ordering to all transactions.

Without further ado, it’s time for some Infrequently Asked Questions On Deterministic Distributed Transaction Management!

I’m busy. Gimme the executive summary.

Daniel introduces his new VLDB paper, where he says he’s got the distributed DBMS your distributed DBMS could smell like, if only you used a centralized, deterministic transaction manager instead of lady-scented body wash. Specifically, he proposes avoiding the dreaded two-phase commit by centrally controlling concurrency, and minimizing lock time with optimistic operation.

I react.

Continue reading

Notes wiki

I started using a gitit wiki to maintain my notes on CS research and programming. It’s available here. For a while, I’ve kept my notes in a bunch of “loosely pandoc” files, so gitit was an easy way to wiki-fy everything—to make them easier to view/edit from any browser, and to share publicly.

At the moment my notes are highly disorganized and probably have many formatting bugs. Also, they typically don’t include topics from classes I’ve taken. I’m hoping this will become a not-too-cryptic way for me to share more information with less effort than writing blog posts.

GRE e-rater

It turns out that the ETS has a whole research division, including researchers in natural language processing who come up with stuff like e-rater and other machine essay graders, and that they publish about these systems.

According to the latest system description paper:

The feature set used with e-rater V.2 include measures of grammar, usage, mechanics, style, organization, development, lexical complexity, and prompt-specific vocabulary usage.

E-rater is part of Criterion, a web-based service that provides students with instant scoring and feedback on their submitted essays. Criterion has a number of writing analysis tools whose output form the feature vector used by e-rater. The score is a simple weighted average of the feature values.

One noteworthy detail is that in determining the parameters to use for this model, e-rater ecshews exclusively statistical machine learning (optimization) approaches in favor of allowing judgmental control, for reasons of control (to avoid unintentional skew and other undesirable statistical effects) and transparency (to make the system easier to understand and explain).

It would be interesting to see how straightforward it is to game e-rater, given the above information and access to the implementation in Criterion.