Category Archives: languages

What’s there to like about R?

Update 10/11/2011: There’s a good discussion on Reddit
Update 10/12/2011: Note manipulate package and highlight data.table package

The R statistical computing platform is a rising star that’s been gaining popularity and attention, but it gets no respect in the hood. It’s telling that a popular guide to R is called The R Inferno, and that advocacy pieces are titled “Why R Doesn’t Suck.” Even the creator of R had this to say about the language in a damning article suggesting starting over with R:

I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too.

So why do people still use R? Would we lose anything if we just migrated to (say) Python, which many consider to be a major contender/alternative to R?

In this post I’m going to highlight a few things that are nice about R—not just in the platform itself, but in the whole ecosystem. These are things that you won’t necessarily find in alternate universes like Python’s.

Continue reading

Functional reactive programming for the web, or: where’s my Lunascript?!

Update: there’s some good discussion on Reddit.

Even when using the latest and greatest frameworks and disciplines, writing fast, highly interacting web applications involves a lot of accidental complexity:

First you need server code to figure out what data the browser needs. Hopefully you have an ORM layer, but you still need to carefully structure your code to minimize your backend dispatches, and you need to carefully keep that in sync with your front-end code lest you don’t fetch enough data or hurt performance by fetching too much. If it’s a Web 2.0-style app, you re-implement a ton of that server-side code in JavaScript, once for creating the page and then again as controller code for keeping it up to date and consistent. And when the user changes something, you bottle that up — typically in a custom over-the-wire format — and send it as an XHR to the server. The server has to de-serialize it into different data structures in a different language, notify the persistence store, figure out what other clients care about the change, and send them a notification over your Comet pipe, which is handled by yet more JavaScript controller code. Offline support? More code.

Strike a chord? That’s the motivation setting the stage for [Lunascript], a reactive programming framework for web apps. What?

Continue reading

Lisp scoping, dynamic scoping

Elisp is dynamically scoped. This means that on each let, we push a new binding onto the environment stack, and on every variable reference, we walk up this stack to find the nearest (lowest-level) binding for this variable’s name.

This is generally undesirable for composability. For instance, if we were to construct a closure to be passed further down the stack, then upon evaluation, a free variable could end up referring to a different variable than what was intended (it will refer to whatever the nearest matching binding is). So, in (map (lambda (x) (+ y x)) xs), we can only hope that map itself doesn’t bind the variable y, or else the occurrence of y in our closure would be captured by map‘s binding rather than the binding at our call site.

However, dynamic scoping can be useful at times. Case in point: Emacs has many global variables representing various editor settings. An example is inhibit-read-only, which disables buffer protection for read-only characters. If you’re writing a function that needs to update these read-only parts of the buffer, you can use dynamic scoping to bind inhibit-read-only, which will be effective across any lexical scope for as long as this binding is on the stack.

Common Lisp and Scheme are lexically scoped, like most modern mainstream languages. However, Scheme has a nice feature called fluid-let, which provides dynamic binding by assigning a new value to the provided variable for the duration (dynamic scope) of the fluid-let, then restoring the original value upon exit. This plays nicely with continuations as well, so that (for instance) we safely restore/set the value upon context-switching into/out of the thread, respectively.

Apparently, the earliest versions of Common Lisp attempted to implement lexical scoping, but inadvertently wound up with dynamic scoping. They actually managed to change the language to use lexical scoping.

Other languages may achieve similar effects via implicit parameters. Implicit Parameters: Dynamic Scoping with Static Types shows how to add this to Haskell (Hugs implements something based on this).

Thanks to Austin for this discussion.

ECMAScript Implementations

  • started evaluating the engines in trying to run the JS beautifier
  • tamarin
    • JIT engine donated by Adobe to Mozilla
    • used in Flash 9
    • written in C++
    • separate compiler (asc) and executor (avmplus)
    • the shell module contains a number of system/IO facilities
      (toplevel.as)
    • “Tamarin was designed to do better with type annotations than without. This is
      why SpiderMonkey actually wins some of those benchmarks against Tamarin on untyped code.
      Our work with SpiderMonkey and Tamarin now in the ActionMonkey project aims to make it
      fast for untyped code as well.”
  • rhino
    • written in Java
  • spidermonkey
    • written in C++
    • help()
    • load() includes javascript files
    • readline() doesn’t really work that great, since it
    • arguments for command-line args
  • es4
    • reference implementation of ECMAScript 4 (Javascript 2)
    • written in SML
    • very slow; it actually times out when running the beautifier (I get “Alarm
      Clock”)

Scala Doc Search; Issue Snapshot

Embarassingly hacked-together Scala documentation search here.

Also on my mind ATM:

  • covariance : pull :: contravariance : push
  • why does Prolog only use DFS?
  • Friday: distributed replay debugging
  • does GDB use write-protected pages on watchpoint register exhaustion?
  • practice ICDE talk on Thursday – colleagues are always more intimidating than strangers
  • popping on Saturday
  • need to get laundry right now

Facebook Programming Puzzles and Dynamic Programming in Haskell

My labmate pointed out to me that Facebook has some programming puzzles. I must have really been aching for Haskell again, because I decided to write up the first one, Evil Gambling Monster (Gamblor); it’s pasted below (TODO: upload the original problem somewhere for posterity). If we let A(t,x) = max possible score at time t at node x, then the recurrence relation is A(t,x) = delta score + (max over all possible previous positions px {A(t-1,px)}, or 0 if all the preceding are negative to capture the possibility of starting right at (t,x)). This one is actually fairly easy to reason about since any attempt to map out a table starting at 0,0 will take you to straight to the correct representation, which is the hardest part.

My first and only other Haskell DP is for the simple coin denomination problem, also pasted below. Haskell was fine for that, but I believe an imperative solution lends itself more naturally for Gamblor, because then we could avoid referencing ‘undefined’ cells of Nothings. Right now I’m a bit too sleepy to rewrite this in a mixed imperative/functional style (using, e.g., Scala or Python), but I’ll briefly try to explain what I mean.

The reason I believe an imperative solution is “superior” is because the data dependencies here are selective and resemble a sort of index-based permutation, and we’re trying to express this data flow with “pulling.” However, it could result in greater efficiency (and clarity) to express the propagation with “pushing,” i.e. destructively updating state (in this case, from the row earlier along the time dimension to the row later). This brought me to the realization that functional programming is “pull,” and imperative programming is “push.” This jives with my previously accepted perspectives on FP’s relative weaknesses – for instance, an imperative histogram builder is known to be more efficient because we push to a hash table instead of pulling tree map after tree map from the source data, and an imperative permutation inverter directly expresses the indices to mutate. In fact, both of these exhibit dictionary indirection in the push – perhaps that’s a more useful conclusion.

(To be sure, this is realized as a pretty small difference in the particular problem at hand – I doubt an imperative solution would be substantially better, especially in the face of a declarative style pervasive throughout the rest of the code. But use your imagination!)

Sadly, my solution is not pretty. Far from it. Again, too sleepy to care. 6.033 TA duties are teh sux.

(Update: cleaned up the code a bit and changed the output format to alternatively output just a synopsis.)

[haskell]
module Main
where

import Control.Arrow
import Data.Array
import Data.List
import Data.Maybe
import Text.Printf

— common —

seqArray a = foldr seq () $ elems a

compareBy f l r = f l `compare` f r

— gamblor, the evil gambling monster —

nx = 9
nt = 30

wins1 = [ 53,-84, 50,-73, 54, 60, 74, 22,-63,-78, 75, 72,
-46, 99,-33, 24, 6,-66, 77,-61,-60,-46,-52, 84,
91,-21,-52,-72,-39,-41]
wins2 = [ 77,-86,-25, 27,-59,-71,-13,-85, 50, 24,-63, 26,
-4,-10, 25, 62,-85,-68, 96, 92,-29,-64,-54, 18,
-79,-62, 97,-32,-35,-42]
wins3 = [ 27,-57,-28,-98, 69, 12,-70,-43, 27, 80, 80, 64,
6,-23,-45,-68,-60,-31,-36,-63,-39, 34,-27, 7,
-47, -7, 44,-50, 60,-90]
wins4 = [ 7,-12,-48, 79,-11,-78, -8, 19,-21,-81, -1,-40,
83,-95, 36,-62,-63, 76, 6, 0,-87, 67,-66,-15,
-26,-14, 78,-81, 36, 38]
wins5 = [-71,-56,-73,-20,-77, 15, 2, 14,-66, 81, 33, 33,
-59, 16, 37, 77, 53, 73, 53,-40,-26, 66,-73, 7,
-48, 1, 93,-70, 19, 30]
wins6 = [ 68, 47, 73, 94,-72, 96, 10, 30, 11, 44, 11,-56,
-23, 51, 60,-86, 29, 13, 87,-17, 73,-39,-51,-99,
68, 1, 1, 62, 30,-79]
wins7 = [ -8, -1, 68,-34, -7, 96,-37,-96, 26, 73, 47,-62,
-83,-76, 89, 77,-62, 18, -9,-75,-99,-36,-14,-50,
-36,-45, 50, 64,-83,-19]
wins8 = [ 85, 9, 79, 53, 75,-28, 49,-62,-25,-24,-89,-77,
13,-72,-54, 2,-95,-17,-80, -5, 8,-79, 59, 93,
-30,-77,-51,-79, 87,-35]
wins9 = [ 1, 72, 74,-20, 26, 49, 52,-25, 86,-72, 50, 97,
-50,-36,-74, -4, 65,-70, 78, 85, 25,-14,-93,-16,
-20,-24, 7, 28, -3, -5]

winsList =
[ wins1
, wins2
, wins3
, wins4
, wins5
, wins6
, wins7
, wins8
, wins9
]

wins = listArray ((1,1),(nx,nt)) $ concat winsList

adjList =
[ [1, 1, 0, 1, 0, 0, 0, 0, 0]
, [1, 1, 1, 0, 1, 0, 0, 0, 0]
, [0, 1, 1, 0, 0, 1, 0, 0, 0]
, [1, 0, 0, 1, 1, 0, 1, 0, 0]
, [0, 1, 0, 1, 1, 1, 0, 1, 0]
, [0, 0, 1, 0, 1, 1, 0, 0, 1]
, [0, 0, 0, 1, 0, 0, 1, 1, 0]
, [0, 0, 0, 0, 1, 0, 1, 1, 1]
, [0, 0, 0, 0, 0, 1, 0, 1, 1]
]

adj = listArray ((1,1),(nx,nx)) $ concat adjList

gamblor debugMode =
let f :: Int -> Int -> Maybe (Int, Int)
f 1 1 = Just (0, wins!(1,1))
f 1 x = Nothing
f t x =
let preds = [ (px, snd $ fromJust e) |
px <- [1..nx], adj!(px,x) == 1, let e = a!(t-1,px), isJust e ] (arbitraryPx, _) = head preds (pred, val) = maximumBy (compareBy snd) ([(arbitraryPx, 0)] ++ preds) in if null preds then Nothing else Just (pred, val + wins!(x,t)) a = listArray ((1,1),(nt,nx)) [ f t x | t <- [1..nt], x <- [1..nx] ] trace gambling (i@(t,x), Just (px,v)) = let v' = if gambling then v else 0 rest = if px == 0 then [] else trace (gambling && v/=0) ((t-1,px), a!(t-1,px)) in (i,v') : rest assocScore = snd . fromJust . snd validCells = filter (isJust . snd) $ assocs a bestAssoc = (maximumBy $ compareBy $ assocScore) validCells trail = reverse $ trace True $ bestAssoc fmt ((t,x),v) = let fmtn :: Int -> String
fmtn x =
let atx = a!(t,x)
cum = if isJust atx
then show $ snd $ fromJust atx
else “-”
in printf “%3d (%3d/%5s) ”
(x::Int)
(wins!(x,t)::Int)
cum
fmta :: Int -> String
fmta x’ = printf “%3s%13s” (if x == x’ then “^” else “”) “”
mkLine f = concat $ map f [1..nx]
in printf “%2d: ” t ++ mkLine fmtn ++ “\n” ++
printf “%2s ” “” ++ mkLine fmta ++ “\n”

fullDump = concat $ map fmt $ trail

synop =
let ((start,_),_) = last $ filter (\((t,x),v)->v==wins!(x,t)) trail
((stop,_),winnings) = last trail
n = start – 1
m = stop – start + 1
k = nt – stop
path = map (\((_,x),_) -> x) $ trail
in printf “winnings = %d, n = %d, m = %d, k = %d, path = %s”
(winnings::Int) (n::Int) (m::Int) (k::Int) (show path)

in seq (seqArray a) $
if debugMode then {-show trail-} fullDump else synop

main = putStrLn $ gamblor False

{-
result:
“winnings = 1745, n = 0, m = 30, k = 0, path = [1,4,7,4,1,1,1,1,2,2,1,1,4,1,4,5,5,5,6,9,6,5,8,5,6,5,4,7,8,5]”
-}
[/haskell]

Oh, one quick note about the following: my first attempt (Rec) ends up recursing. This is not a problem with the performance of the “DP” per se (since the subproblem results do get memoized), but rather with the laziness; it only occurs you try evaluating for n >= 9999 or something. The way around this is to unwind the schedule of thunk evaluation with seqArray, which I should add to my Haskell Commons library. Some day! Some day my libraries will see the light of that day.

[haskell]– See also the Python version.

module Main
where

import Data.Array

main = interact $ show . coinDenomOrig . read

fib :: Int -> Int
fib n =
let f i = if i >= 2 then a!(i-1) + a!(i-2) else 1
a = array (0,n) [ (i, f i) | i <- [0..n] ] in a!n coins = [1,5,10,25] coinDenomRec :: Int -> Int
coinDenomRec n =
let f i | i == 0 = 0
| otherwise = minimum [ f (i-j) + 1 | j <- coins, j <= i ] a = array (0,n) [ (i, f i) | i <- [0..n] ] in a!n coinDenomOrig :: Int -> Int
coinDenomOrig n =
let f i | i == 0 = 0
| otherwise = {-# SCC “f” #-} minimum [ a!(i-j) + 1 | j <- coins, j <= i ] a = array (0,n) [ (i, f i) | i <- [0..n] ] in a!n --seqArray a = foldr seq () [a ! k | k <- range (bounds a)] seqArray a = foldr seq () $ elems a coinDenom :: Int -> Int
coinDenom n =
let f i = {-# SCC “f” #-} minimum [ a!(i-j) + 1 | j <- coins, j <= i ] a = array (0,n) $ [ (0,0) ] ++ [ (i, f i) | i <- [1..n] ] in seqArray a `seq` a!n [/haskell]

[python]
def f(n):
coins = [1,5,10,25]
a = [0] * (n+1)
for i in xrange(1,n+1):
a[i] = min( a[i-j] + 1 for j in coins if j <= i ) return a[n] import sys print f(int(sys.stdin.read())) [/python]