Category Archives: hadoop

No-nonsense getting started with standalone Hadoop and Dumbo on Ubuntu

Dumbo is a nifty Python package from the Audioscrobbler data crunchers at last.fm that lets you write Hadoop (Hadoop Streaming) jobs in Python. In this getting-started guide, we’ll install Cloudera’s distribution of Hadoop and Dumbo on Ubuntu, with minimal fuss. For more elaborate documentation, see the Cloudera documentation archives.

Continue reading