Monthly Archives: January 2010

No-nonsense getting started with standalone Hadoop and Dumbo on Ubuntu

Dumbo is a nifty Python package from the Audioscrobbler data crunchers at that lets you write Hadoop (Hadoop Streaming) jobs in Python. In this getting-started guide, we’ll install Cloudera’s distribution of Hadoop and Dumbo on Ubuntu, with minimal fuss. For more elaborate documentation, see the Cloudera documentation archives.

Continue reading

Making sense of OpenID, OAuth, OpenSocial, Google Friend Connect, Facebook Connect, and more

Last Thursday I dropped in to the Google SIPB hackathon, where I got a chance to chat with several Googlers in the Cambridge office about the whole ecosystem of decentralized identity and social networking services. I had actually previously spent a bit of time searching for a high-level map laying out how these various services related to each other, strictly out of curiosity, but never really found anything that was succinct, clear, and free of BS. There also seems to be a lot of contradictory information and general confusion. Along with the recent news expecting similar service stacks from Twitter, it seems timely to share all the things I’ve been learning.

The executive summary:

  • OpenID: authentication; use one login across many sites
  • OpenID Attribute Exchange: a key-value store protocol for OpenID
  • OAuth: authorization and authentication; grant sites specific access to your account
  • OAuth WRAP: a simpler OAuth that leverages PKI instead of its own signature scheme
  • OpenSocial: a standard API into social networks
  • Google Friend Connect: an OpenSocial router, plus a bunch of other less-important stuff
  • Facebook Platform: all the above (and a bit more), for the Facebook stack
  • Facebook Connect: establish links between Facebook and third-party user account systems
  • Portable Contacts: just the slice of OpenSocial that deals with contacts

Continue reading

Bitten by Python scoping

Yet again, I wasted too many minutes staring at and debugging my Python code due to the language’s funky variable scoping:

def relevant(xs, y):
  "Return elements in xs that are relevant to y."
  pairs = ((x, relevance(x,y)) for x in xs)
  return [(x,y) for x,y in pairs if y > 0]

In this case, the y in the list comprehension modifies the binding used by the generator expression.