Monthly Archives: November 2008

Practical media downloading

A random assortment of tips for getting stuff with some mix of expedience, safety, and broad coverage.

  • For single tracks, start with Skreemr. It doesn’t manage to find that much, unfortunately, but it’s a quick check. Other similar services include BeeMP3 and SeeqPod, which have (IMO) clumsier UIs. Once you’ve checked with Skreemr, just try Googling for the artist, title, and “mp3” – this usually works, esp. for popular tracks, but is less slick than Skreemr since you’ll probably land at a site where you need to satisfy some captcha, run a gauntlet of ads or gaudy web designs, wait some number of seconds, find a broken link, and retry with the next few hits.
  • Alternatively, download full albums. Even if you’re after just a single track, if you can’t find it, you may be able to find its album. You can do this by Googling for the artist and album name alongside the names of popular “private” file-sharing services (“rapidshare | megaupload | …”). These file hosting services are actually good for finding large media files in general, including movies and TV shows. I have a little script for making these kinds of searches.
  • You can always resort to BitTorrent or a file-sharing network if you’re feeling a bit more promiscuous. If you’re on Windows, one of the better file-sharing clients is Shareaza, which supports Gnutella, ed2k, and “Gnutella2.” For BitTorrent, Shareaza’s support wasn’t that great when I tried. I’ve heard good things about uTorrent, which is probably the most popular client on Windows, but I’ve only ever used the cross-platform Vuze (formerly Azureus). Vuze sports an awesome search interface that lets you breezily grab torrents from some of the popular sites out there including btjunkie and mininova.
  • When engaging in promiscuous file-sharing, try to operate from a WLAN for which logs are not kept (e.g., StataCenter). For a general public WLAN, you can fiddle with your MAC address and minimize concurrent traffic from the same host (esp. anything that could identify you).
  • You can use an IP filter (“firewall”) to prevent communication with certain hosts. BlueTack maintains popular blocklists. A paper from UCR called “P2P: Is Big Brother Watching You?” (Ars Technica article) concludes that users will exchange data with blocklisted users 100% of the time, and that blocking just 5 IPs reduces this to 1%.
  • Blocklist managers are optimized to filter large numbers/ranges of IPs. PeerGuardian 2 and moblock are popular blocklist managers for Windows and Linux, respectively.
  • Should you be simply unable to find the song anywhere but a streaming source, such as Songza, Last.fm, MySpace, the artist’s website, etc., then just capture your system audio output while playing it from the streaming source. (Streaming audio quality tends to be lower, though.)
  • Find iTunes shares and public file shares (CIFS/SMB, FTP, etc.) on your LAN. Might work well if you’re in something like a dorm or frat setting, depending on the network configuration. I don’t know if there’s some working continuation of myTunes or if ourTunes still works, but those are places to start looking for iTunes pulling.
  • For TV shows, another possibility is to just watch them streamed from the web. You can find a lot on YouTube (though certain videos might just be around for short windows of time). Sidereel is a community that organizes links to these videos into shows and episodes.

Updated 7/21/2009: added note on Googling for single tracks.

Default behavior of Python’s cmp()

The docs say:

If no __cmp__(), __eq__() or __ne__() operation is defined, class instances are compared by object identity (“address”).

However, this isn’t entirely accurate. It’s accurate for classic classes, but new-style classes (the default in Python 3) are first compared by their type names (and then by their type IDs if their type names are identical). I don’t know why this is done, but I noticed this behavior while writing an autograder for the class I’m TAing.

From the file Objects/object.c in Python 2.6:

/* Final fallback 3-way comparison, returning an int.  Return:
   -2 if an error occurred;
   -1 if v <  w;
    0 if v == w;
    1 if v >  w.
*/
static int
default_3way_compare(PyObject *v, PyObject *w)
{
  int c;
  const char *vname, *wname;

  if (v->ob_type == w->ob_type) {
    /* When comparing these pointers, they must be cast to
     * integer types (i.e. Py_uintptr_t, our spelling of C9X's
     * uintptr_t).  ANSI specifies that pointer compares other
     * than == and != to non-related structures are undefined.
     */
    Py_uintptr_t vv = (Py_uintptr_t)v;
    Py_uintptr_t ww = (Py_uintptr_t)w;
    return (vv < ww) ? -1 : (vv > ww) ? 1 : 0;
  }

  /* None is smaller than anything */
  if (v == Py_None)
    return -1;
  if (w == Py_None)
    return 1;

  /* different type: compare type names; numbers are smaller */
  if (PyNumber_Check(v))
    vname = "";
  else
    vname = v->ob_type->tp_name;
  if (PyNumber_Check(w))
    wname = "";
  else
    wname = w->ob_type->tp_name;
  c = strcmp(vname, wname);
  if (c < 0)
    return -1;
  if (c > 0)
    return 1;
  /* Same type name, or (more likely) incomparable numeric types */
  return ((Py_uintptr_t)(v->ob_type) < (
    Py_uintptr_t)(w->ob_type)) ? -1 : 1;
}

Thanks to Szymon for the discussion.

Protocol Buffers vs. Thrift

I’ve been playing around a bit with Protocol Buffers and Thrift. Thrift has many more features:

  • generates RPC service implementations (PB only generates the interfaces); currently targets libevent
  • targets more languages
  • constants
  • exceptions
  • multiple protocols (binary, JSON)
  • asynchronous procedures
  • more collection types, such as maps/sets; this makes it easier to use the messages directly as the primary representations of your program’s data

Despite Thrift’s additional features, for the small project I’m currently working on, I’m going with PB. Some reasons:

  • (at least for C++) the interface to reading/writing messages feels “lighter”: you don’t need to manually contsruct transports/protocols/etc., heap-allocate and wrap things in shared_ptrs, perform copies to get at the data inside a TMemoryBuffer (unless you write your own TTransport), and so on.
  • PB is a more mature tool than Thrift
  • documentation
  • not as fast or as compact as PB
  • weaker encapsulation: it exposes public fields and uses the language’s standard library containers, which precludes tricks such as backing messages directly with the serialization buffer
  • small annoyances, such as not having packages for and not building cleanly on Ubuntu 8.10

Python default parameters

I was recently bitten by this: “Default parameter values are evaluated when the function definition is executed.” Demo:

def mklist():
  print 'making list'
  return []

def f(x=[]):
  x.append(3)
  print x

print 'start'
f()
f()

The output:

making list
start
[3]
[3, 3]

Annoyingly, the above page from the language reference acknowledges that “This is generally not what was intended,” without justifying the status quo.