No-nonsense getting started with standalone Hadoop and Dumbo on Ubuntu

Dumbo is a nifty Python package from the Audioscrobbler data crunchers at that lets you write Hadoop (Hadoop Streaming) jobs in Python. In this getting-started guide, we’ll install Cloudera’s distribution of Hadoop and Dumbo on Ubuntu, with minimal fuss. For more elaborate documentation, see the Cloudera documentation archives.

Application-based packet filtering on Linux

iptables can’t filter on process ID or any other “direct” application identifier, which means you can’t say things like, “Allow only Firefox to send/receive any packets.” However, it can filter on user/group ID, allowing you to do user-based packet filtering, so that you can at least restrict applications if you run them as a certain uid/gid. The owner module (xt_owner) matches the owner of the socket (man iptables for more details).

# iptables -m owner --help
iptables v1.4.4
owner match options:
[!] --uid-owner userid[-userid]      Match local UID
[!] --gid-owner groupid[-groupid]    Match local GID
[!] --socket-exists                  Match if socket exists

Of course, this all applies only to local sockets; if this system is serving as a router for other hosts, then you don’t have the uid/gid information for their sockets (if their OS even has those notions).

Ubuntu 8.04 upgrade success

apt-get install -f got me out of the dependency failure tarpit that was killing my attempts to apt-get install packages and to dpkg --configure -a. Another thing to try would’ve been apt-get clean -f. A subsequent apt-get dist-upgrade worked (took under 3 minutes).

Upgrading to Ubuntu 8.04

I have two Ubuntu 7.10 systems that I tried upgrading to 8.04. The first (x86_64) encountered a bunch of problems during the Clean-Up phase, causing a whole slew of package system complaints on reboot, while the second (x86) failed during the Installing/Upgrading Packages stage, leaving my system in an “unusable state” and leaving me unable to start Gnome.

More details are on Launchpad: the first system were all semi-automatically reported tickets using the new error reporting system (which sets the tickets to be private), and one ticket for the second system.

Last time I tried upgrading was from 7.04 to 7.10 on my Dell Latitude D600, which also failed and brought my system into an “unusable state.”

OS X woes

Not to be taken too seriously or during pregnancy….

  • The setup asks you too many questions.
  • Keyboard user experience sucks.
    • Too much reliance on the mouse; can’t even tab to move among fields.
    • It’s inconsistent – some dialogs allow tab movement, and others don’t.
      • E.g.: Firefox’s remember-passwords dialog
      • E.g.: OS X setup wizard (during the barrage of questions)
    • Universal access is cumbersome. Especially when you need ctrl to work like normal with the Terminal.
  • AirPort sucks
    • It always loses my connection whereas my Dell and IBM laptops don’t ever have this problem. It hides too much information – I can’t tell what’s going on (e.g., is it trying to fix my connection?).
    • When automation is unreliable, manual intervention should be easy. However, there is no manual “repair” button – I have to shut down AirPort, wait a couple of seconds, and then turn it on again.
    • There was a period during which AirPort actually lost its connection whenever I tried ssh-ing into previously ssh-ed hosts.
  • Terminal sucks
    • ANSI colors need to be configurable!
    • No font resizing
    • Occasionally, ctrl-c stops working, and I have to use cmd-.
  • Powerbook display is too white. Seriously! E.g., when I use MS Remote Desktop to connect to my main XP machine, everything is tinted too white, even at the minimum brightness. At first, it looks kinda cool, but it really strains your eyes after a short while.
  • Apple Mail is b0rked
    • Apple Mail requires failure before you can enable SSL in an IMAP account. I think there was a failed attempt somewhere to auto-discover the maximum security authentication protocol.
    • Actually unable to download all messages from my IMAP account (only a couple thousand messages). It downloads in bursts of a couple dozen; you need to continually click somewhere aside from the Inbox, and then click on the Inbox again for the damn thing to resume downloading. I’ve sat there forever, thinking it would ultimately resume, but no such luck. I eventually gave up before making it to 40% of all my messages.
  • SUICIDE MODE! Once, after closing the lid, instead of suspending, the computer keeps running and becomes flaming hot. Opening the lid doesn’t do anything – the screen remains blank, yet the computer is somewhat responding (the Caps Lock key would still cause the Caps Lock LED to light up). I had to power cycle it.
  • Safari sucks. But this would be another article by itself, so I won’t go into it. Just install Opera and Firefox.
  • The update system requires me to click through O(n) license agreements. Also, the “Pause” button lies. It actually aborts the entire download and restarts.
  • Hardware seems to be pretty good, aside from the suicide.
    • Suspend/resume is super-fast.
    • Although mine doesn’t have one, I’ve seen the glow-in-the-dark keyboards.
    • The magnetic hook is nice. Attention to subtle details is always appreciated.
    • You do pay a premium for Macs and OS X, though
  • Eye candy is great.
  • Things I’ve yet to figure out
    • The file system organization – this should be interesting (e.g., what exactly are all the apps? Single files?)
    • A lot of the native apps and also the apps Sam gave me (Omni*, iWork)
    • Fink, DarwinPorts, etc.


I just got my first BSOD in a long, long time. It has something to do with the winpcap library driver thingy, since Windows blamed npf.sys and I was invoking nmap at the moment.
For some reason, I didn’t get a dialog asking me to report this error to MS. I checked %SYSTEMROOT%\Minidump and found that my last dump was actually from back in October (for a total of three dumps in all, the other from 8/05). I’m curious to see what were the problems there. One day (not any time soon) I’ll whip out windbg once more.

I hate Linux

I’m going to regret this…

Preparing to replace libc6 2.3.2.ds1-22 (using .../libc6_2.3.5-12.1_i386.deb) ...

Name Service Switch update in the C Library: pre-installation question.

Running services and programs that are using NSS need to be restarted,
otherwise they might not be able to do lookup or authentication any more.
The installation process is able to restart some services (such as ssh or
telnetd), but other programs cannot be restarted automatically.  One such
program that needs manual stopping and restart after the glibc upgrade by
yourself is xdm - because automatic restart might disconnect your active
X11 sessions.

Known packages that need to be stopped before the glibc upgrade are:
xdm kdm gdm postgresql xscreensaver

This script did not detect any installed services which need to be

If you want to interrupt the upgrade now and continue later, please
answer No to the question below.

Do you want to upgrade glibc now? [Y/n] y