Default behavior of Python’s cmp()

The docs say:

If no __cmp__(), __eq__() or __ne__() operation is defined, class instances are compared by object identity (“address”).

However, this isn’t entirely accurate. It’s accurate for classic classes, but new-style classes (the default in Python 3) are first compared by their type names (and then by their type IDs if their type names are identical). I don’t know why this is done, but I noticed this behavior while writing an autograder for the class I’m TAing.

From the file Objects/object.c in Python 2.6:

/* Final fallback 3-way comparison, returning an int.  Return:
   -2 if an error occurred;
   -1 if v <  w;
    0 if v == w;
    1 if v >  w.
*/
static int
default_3way_compare(PyObject *v, PyObject *w)
{
  int c;
  const char *vname, *wname;

  if (v->ob_type == w->ob_type) {
    /* When comparing these pointers, they must be cast to
     * integer types (i.e. Py_uintptr_t, our spelling of C9X's
     * uintptr_t).  ANSI specifies that pointer compares other
     * than == and != to non-related structures are undefined.
     */
    Py_uintptr_t vv = (Py_uintptr_t)v;
    Py_uintptr_t ww = (Py_uintptr_t)w;
    return (vv < ww) ? -1 : (vv > ww) ? 1 : 0;
  }

  /* None is smaller than anything */
  if (v == Py_None)
    return -1;
  if (w == Py_None)
    return 1;

  /* different type: compare type names; numbers are smaller */
  if (PyNumber_Check(v))
    vname = "";
  else
    vname = v->ob_type->tp_name;
  if (PyNumber_Check(w))
    wname = "";
  else
    wname = w->ob_type->tp_name;
  c = strcmp(vname, wname);
  if (c < 0)
    return -1;
  if (c > 0)
    return 1;
  /* Same type name, or (more likely) incomparable numeric types */
  return ((Py_uintptr_t)(v->ob_type) < (
    Py_uintptr_t)(w->ob_type)) ? -1 : 1;
}

Thanks to Szymon for the discussion.

Follow me on Twitter for stuff far more interesting than what I blog.

  • Chris Greene

    Hi! Thanks for this link. I suspect the reason for this sorting by class name is as follows:

    consider the list

    [100,”a”,5]

    we would expect that when this array is sorted we’d get 5 before 100.

    However, if mapping id to each element gave us

    [1,2,3]

    this list would aready be in sorted order*. And you would be sorting on this ground presumably because all adjacent types are different. To ensure that the sensible cmp function is called you need all like types together.

    *(yes, numbers are automatically less than any other element, so in a real example, we’d need custom types to pull this off)

  • Chris Greene

    Hi! Thanks for this link. I suspect the reason for this sorting by class name is as follows:

    consider the list

    [100,”a”,5]

    we would expect that when this array is sorted we’d get 5 before 100.

    However, if mapping id to each element gave us

    [1,2,3]

    this list would aready be in sorted order*. And you would be sorting on this ground presumably because all adjacent types are different. To ensure that the sensible cmp function is called you need all like types together.

    *(yes, numbers are automatically less than any other element, so in a real example, we’d need custom types to pull this off)