Category Archives: python

What’s there to like about R?

Update 10/11/2011: There’s a good discussion on Reddit
Update 10/12/2011: Note manipulate package and highlight data.table package

The R statistical computing platform is a rising star that’s been gaining popularity and attention, but it gets no respect in the hood. It’s telling that a popular guide to R is called The R Inferno, and that advocacy pieces are titled “Why R Doesn’t Suck.” Even the creator of R had this to say about the language in a damning article suggesting starting over with R:

I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too.

So why do people still use R? Would we lose anything if we just migrated to (say) Python, which many consider to be a major contender/alternative to R?

In this post I’m going to highlight a few things that are nice about R—not just in the platform itself, but in the whole ecosystem. These are things that you won’t necessarily find in alternate universes like Python’s.

Continue reading

Fast, native-C Protocol Buffers from Python

As of the latest release of Protocol Buffers (2.3), protoc --py_out generates only pure Python code. While PB can generate fast parsing and serialization code for C++, this isn’t made available to Python, and manually wrapping generated code amounts to very tedious maintenance work. This is a recurring feature request from the discussion group, but pure Python code was a higher priority because of certain client requirements—namely (according to the team), AppEngine.

Luckily, native code is slated for PB 2.4, and has been available in the svn trunk, so you’ve been able to use fast PBs for a while now. (We’ve been using r352 for some time and have not seen any problems.) The PB team has been understandably reluctant to specify any release date, but in my thread Kenton Varda does mention early 2011 as a rough estimate.

I haven’t seen this documented anywhere else yet, so hopefully this will be useful to others.

Continue reading

Cute silent failure building pycurl

I was trying to build and install pycurl, but it never actually installed properly. import pycurl just complained that the module was missing, and sure enough, I couldn’t find it anywhere, despite that ./setup.py install kept succeeding.

Peeking into an rpm for pycurl verified that pycurl.so was missing. So why wasn’t it getting installed? It wasn’t even in the build directory. It turns out that I had missed the key error message in the jumble below:

Continue reading

Bitten by Python scoping

Yet again, I wasted too many minutes staring at and debugging my Python code due to the language’s funky variable scoping:

def relevant(xs, y):
  "Return elements in xs that are relevant to y."
  pairs = ((x, relevance(x,y)) for x in xs)
  return [(x,y) for x,y in pairs if y > 0]

In this case, the y in the list comprehension modifies the binding used by the generator expression.

Default behavior of Python’s cmp()

The docs say:

If no __cmp__(), __eq__() or __ne__() operation is defined, class instances are compared by object identity (“address”).

However, this isn’t entirely accurate. It’s accurate for classic classes, but new-style classes (the default in Python 3) are first compared by their type names (and then by their type IDs if their type names are identical). I don’t know why this is done, but I noticed this behavior while writing an autograder for the class I’m TAing.

From the file Objects/object.c in Python 2.6:

/* Final fallback 3-way comparison, returning an int.  Return:
   -2 if an error occurred;
   -1 if v <  w;
    0 if v == w;
    1 if v >  w.
*/
static int
default_3way_compare(PyObject *v, PyObject *w)
{
  int c;
  const char *vname, *wname;

  if (v->ob_type == w->ob_type) {
    /* When comparing these pointers, they must be cast to
     * integer types (i.e. Py_uintptr_t, our spelling of C9X's
     * uintptr_t).  ANSI specifies that pointer compares other
     * than == and != to non-related structures are undefined.
     */
    Py_uintptr_t vv = (Py_uintptr_t)v;
    Py_uintptr_t ww = (Py_uintptr_t)w;
    return (vv < ww) ? -1 : (vv > ww) ? 1 : 0;
  }

  /* None is smaller than anything */
  if (v == Py_None)
    return -1;
  if (w == Py_None)
    return 1;

  /* different type: compare type names; numbers are smaller */
  if (PyNumber_Check(v))
    vname = "";
  else
    vname = v->ob_type->tp_name;
  if (PyNumber_Check(w))
    wname = "";
  else
    wname = w->ob_type->tp_name;
  c = strcmp(vname, wname);
  if (c < 0)
    return -1;
  if (c > 0)
    return 1;
  /* Same type name, or (more likely) incomparable numeric types */
  return ((Py_uintptr_t)(v->ob_type) < (
    Py_uintptr_t)(w->ob_type)) ? -1 : 1;
}

Thanks to Szymon for the discussion.

Python default parameters

I was recently bitten by this: “Default parameter values are evaluated when the function definition is executed.” Demo:

def mklist():
  print 'making list'
  return []

def f(x=[]):
  x.append(3)
  print x

print 'start'
f()
f()

The output:

making list
start
[3]
[3, 3]

Annoyingly, the above page from the language reference acknowledges that “This is generally not what was intended,” without justifying the status quo.

Facebook Programming Puzzles and Dynamic Programming in Haskell

My labmate pointed out to me that Facebook has some programming puzzles. I must have really been aching for Haskell again, because I decided to write up the first one, Evil Gambling Monster (Gamblor); it’s pasted below (TODO: upload the original problem somewhere for posterity). If we let A(t,x) = max possible score at time t at node x, then the recurrence relation is A(t,x) = delta score + (max over all possible previous positions px {A(t-1,px)}, or 0 if all the preceding are negative to capture the possibility of starting right at (t,x)). This one is actually fairly easy to reason about since any attempt to map out a table starting at 0,0 will take you to straight to the correct representation, which is the hardest part.

My first and only other Haskell DP is for the simple coin denomination problem, also pasted below. Haskell was fine for that, but I believe an imperative solution lends itself more naturally for Gamblor, because then we could avoid referencing ‘undefined’ cells of Nothings. Right now I’m a bit too sleepy to rewrite this in a mixed imperative/functional style (using, e.g., Scala or Python), but I’ll briefly try to explain what I mean.

The reason I believe an imperative solution is “superior” is because the data dependencies here are selective and resemble a sort of index-based permutation, and we’re trying to express this data flow with “pulling.” However, it could result in greater efficiency (and clarity) to express the propagation with “pushing,” i.e. destructively updating state (in this case, from the row earlier along the time dimension to the row later). This brought me to the realization that functional programming is “pull,” and imperative programming is “push.” This jives with my previously accepted perspectives on FP’s relative weaknesses – for instance, an imperative histogram builder is known to be more efficient because we push to a hash table instead of pulling tree map after tree map from the source data, and an imperative permutation inverter directly expresses the indices to mutate. In fact, both of these exhibit dictionary indirection in the push – perhaps that’s a more useful conclusion.

(To be sure, this is realized as a pretty small difference in the particular problem at hand – I doubt an imperative solution would be substantially better, especially in the face of a declarative style pervasive throughout the rest of the code. But use your imagination!)

Sadly, my solution is not pretty. Far from it. Again, too sleepy to care. 6.033 TA duties are teh sux.

(Update: cleaned up the code a bit and changed the output format to alternatively output just a synopsis.)

[haskell]
module Main
where

import Control.Arrow
import Data.Array
import Data.List
import Data.Maybe
import Text.Printf

— common —

seqArray a = foldr seq () $ elems a

compareBy f l r = f l `compare` f r

— gamblor, the evil gambling monster —

nx = 9
nt = 30

wins1 = [ 53,-84, 50,-73, 54, 60, 74, 22,-63,-78, 75, 72,
-46, 99,-33, 24, 6,-66, 77,-61,-60,-46,-52, 84,
91,-21,-52,-72,-39,-41]
wins2 = [ 77,-86,-25, 27,-59,-71,-13,-85, 50, 24,-63, 26,
-4,-10, 25, 62,-85,-68, 96, 92,-29,-64,-54, 18,
-79,-62, 97,-32,-35,-42]
wins3 = [ 27,-57,-28,-98, 69, 12,-70,-43, 27, 80, 80, 64,
6,-23,-45,-68,-60,-31,-36,-63,-39, 34,-27, 7,
-47, -7, 44,-50, 60,-90]
wins4 = [ 7,-12,-48, 79,-11,-78, -8, 19,-21,-81, -1,-40,
83,-95, 36,-62,-63, 76, 6, 0,-87, 67,-66,-15,
-26,-14, 78,-81, 36, 38]
wins5 = [-71,-56,-73,-20,-77, 15, 2, 14,-66, 81, 33, 33,
-59, 16, 37, 77, 53, 73, 53,-40,-26, 66,-73, 7,
-48, 1, 93,-70, 19, 30]
wins6 = [ 68, 47, 73, 94,-72, 96, 10, 30, 11, 44, 11,-56,
-23, 51, 60,-86, 29, 13, 87,-17, 73,-39,-51,-99,
68, 1, 1, 62, 30,-79]
wins7 = [ -8, -1, 68,-34, -7, 96,-37,-96, 26, 73, 47,-62,
-83,-76, 89, 77,-62, 18, -9,-75,-99,-36,-14,-50,
-36,-45, 50, 64,-83,-19]
wins8 = [ 85, 9, 79, 53, 75,-28, 49,-62,-25,-24,-89,-77,
13,-72,-54, 2,-95,-17,-80, -5, 8,-79, 59, 93,
-30,-77,-51,-79, 87,-35]
wins9 = [ 1, 72, 74,-20, 26, 49, 52,-25, 86,-72, 50, 97,
-50,-36,-74, -4, 65,-70, 78, 85, 25,-14,-93,-16,
-20,-24, 7, 28, -3, -5]

winsList =
[ wins1
, wins2
, wins3
, wins4
, wins5
, wins6
, wins7
, wins8
, wins9
]

wins = listArray ((1,1),(nx,nt)) $ concat winsList

adjList =
[ [1, 1, 0, 1, 0, 0, 0, 0, 0]
, [1, 1, 1, 0, 1, 0, 0, 0, 0]
, [0, 1, 1, 0, 0, 1, 0, 0, 0]
, [1, 0, 0, 1, 1, 0, 1, 0, 0]
, [0, 1, 0, 1, 1, 1, 0, 1, 0]
, [0, 0, 1, 0, 1, 1, 0, 0, 1]
, [0, 0, 0, 1, 0, 0, 1, 1, 0]
, [0, 0, 0, 0, 1, 0, 1, 1, 1]
, [0, 0, 0, 0, 0, 1, 0, 1, 1]
]

adj = listArray ((1,1),(nx,nx)) $ concat adjList

gamblor debugMode =
let f :: Int -> Int -> Maybe (Int, Int)
f 1 1 = Just (0, wins!(1,1))
f 1 x = Nothing
f t x =
let preds = [ (px, snd $ fromJust e) |
px <- [1..nx], adj!(px,x) == 1, let e = a!(t-1,px), isJust e ] (arbitraryPx, _) = head preds (pred, val) = maximumBy (compareBy snd) ([(arbitraryPx, 0)] ++ preds) in if null preds then Nothing else Just (pred, val + wins!(x,t)) a = listArray ((1,1),(nt,nx)) [ f t x | t <- [1..nt], x <- [1..nx] ] trace gambling (i@(t,x), Just (px,v)) = let v' = if gambling then v else 0 rest = if px == 0 then [] else trace (gambling && v/=0) ((t-1,px), a!(t-1,px)) in (i,v') : rest assocScore = snd . fromJust . snd validCells = filter (isJust . snd) $ assocs a bestAssoc = (maximumBy $ compareBy $ assocScore) validCells trail = reverse $ trace True $ bestAssoc fmt ((t,x),v) = let fmtn :: Int -> String
fmtn x =
let atx = a!(t,x)
cum = if isJust atx
then show $ snd $ fromJust atx
else “-”
in printf “%3d (%3d/%5s) ”
(x::Int)
(wins!(x,t)::Int)
cum
fmta :: Int -> String
fmta x’ = printf “%3s%13s” (if x == x’ then “^” else “”) “”
mkLine f = concat $ map f [1..nx]
in printf “%2d: ” t ++ mkLine fmtn ++ “\n” ++
printf “%2s ” “” ++ mkLine fmta ++ “\n”

fullDump = concat $ map fmt $ trail

synop =
let ((start,_),_) = last $ filter (\((t,x),v)->v==wins!(x,t)) trail
((stop,_),winnings) = last trail
n = start – 1
m = stop – start + 1
k = nt – stop
path = map (\((_,x),_) -> x) $ trail
in printf “winnings = %d, n = %d, m = %d, k = %d, path = %s”
(winnings::Int) (n::Int) (m::Int) (k::Int) (show path)

in seq (seqArray a) $
if debugMode then {-show trail-} fullDump else synop

main = putStrLn $ gamblor False

{-
result:
“winnings = 1745, n = 0, m = 30, k = 0, path = [1,4,7,4,1,1,1,1,2,2,1,1,4,1,4,5,5,5,6,9,6,5,8,5,6,5,4,7,8,5]”
-}
[/haskell]

Oh, one quick note about the following: my first attempt (Rec) ends up recursing. This is not a problem with the performance of the “DP” per se (since the subproblem results do get memoized), but rather with the laziness; it only occurs you try evaluating for n >= 9999 or something. The way around this is to unwind the schedule of thunk evaluation with seqArray, which I should add to my Haskell Commons library. Some day! Some day my libraries will see the light of that day.

[haskell]– See also the Python version.

module Main
where

import Data.Array

main = interact $ show . coinDenomOrig . read

fib :: Int -> Int
fib n =
let f i = if i >= 2 then a!(i-1) + a!(i-2) else 1
a = array (0,n) [ (i, f i) | i <- [0..n] ] in a!n coins = [1,5,10,25] coinDenomRec :: Int -> Int
coinDenomRec n =
let f i | i == 0 = 0
| otherwise = minimum [ f (i-j) + 1 | j <- coins, j <= i ] a = array (0,n) [ (i, f i) | i <- [0..n] ] in a!n coinDenomOrig :: Int -> Int
coinDenomOrig n =
let f i | i == 0 = 0
| otherwise = {-# SCC “f” #-} minimum [ a!(i-j) + 1 | j <- coins, j <= i ] a = array (0,n) [ (i, f i) | i <- [0..n] ] in a!n --seqArray a = foldr seq () [a ! k | k <- range (bounds a)] seqArray a = foldr seq () $ elems a coinDenom :: Int -> Int
coinDenom n =
let f i = {-# SCC “f” #-} minimum [ a!(i-j) + 1 | j <- coins, j <= i ] a = array (0,n) $ [ (0,0) ] ++ [ (i, f i) | i <- [1..n] ] in seqArray a `seq` a!n [/haskell]

[python]
def f(n):
coins = [1,5,10,25]
a = [0] * (n+1)
for i in xrange(1,n+1):
a[i] = min( a[i-j] + 1 for j in coins if j <= i ) return a[n] import sys print f(int(sys.stdin.read())) [/python]