Tuesday, August 10, 2010

Racket vs. Clojure

I've been asked by several people to explain why I use Clojure for my professional work rather than Racket.

ABOUT RACKET


I have been using Racket (a dialect of Scheme) for several years to teach kids how to program. Although Racket is a great first language, it's definitely not a "toy language". In fact, Racket offers a number of interesting features not found in other languages, making it an attractive option for real-world work. Racket puts into practice state-of-the-art research on macros, continuations, contracts, and interoperation between static and dynamically typed code. The integrated Scribble system makes it easy to provide high-quality documentation and/or write literate programs. It comes with a pleasant, lightweight IDE complete with an integrated debugger and profiler (as well as innovative features such as a specialized macro debugger).

I'm a fan of functional programming and dynamic typing. I know how to write and think in Racket from my many years teaching it, so with all these features, it should be a slam dunk for me to use it professionally, right?

Well, no....

IT'S ALL ABOUT THE DATA STRUCTURES


I have discovered that for me, the #1 factor that determines my programming productivity is the set of data structures that are built-in to the language and are easy to work with. For many years, Python set the standard for me, offering easy syntax to manipulate extensible arrays (called lists in Python), hash tables (called dictionaries in Python), tuples (an immutable collection that can serve as keys in a hash table), and in recent versions of Python, sets (mutable and immutable), heaps, and queues.

Racket, as a dialect of Scheme, places the greatest importance on singly-linked lists. OK, that's a reasonable starting point -- you can do a lot with linked lists. It also offers a vector, which is an old-fashioned non-extensible array that is fixed in length. (Who wants fixed-length arrays as a primary data structure any more? Even C++ STL offers an extensible vector...)

Vectors are mutable, which is both a plus and a minus. On the plus side, it allows you to efficiently write certain classes of algorithms that are hard to write with linked lists. It serves a purpose that is different from linked lists, so there is value to having both in the language. The huge minus is that Racket simply isn't oriented towards working conveniently with mutable vectors. Working with mutable data structures conveniently demands certain kinds of control structures, and certain kinds of syntaxes. You can write vector-based algorithms in Racket, but they look verbose and ugly. Which would you rather read:
a[i]+=3 or (vector-set! a i (+ (vector-ref a i) 3)) ?
But if you can get past the more verbose syntax, there's still the fundamental issue that all the patterns change when you move from using a list to a vector. The way of working with them is so fundamentally different that there is no easy way to change code from using one to another.

Racket goes further than most Scheme implementations in providing built-in data structures. It also offers, for example, hash tables (and recently sets were added). But the interface for interacting with hash tables is a total mess. The literals for expressing hash tables use dotted pairs. If you want to construct hash tables using the for/hash syntax, you need to use "values". If you want to iterate through all the key/value pairs of a hash table, it would be nice if there were an easy way to recursively process the sequence of key/value pairs the way you would process a list. Unfortunately, Racket provides no built-in lazy list/stream, so you'd need to realize the entire list. But even if that's what you'd want to do, Racket doesn't provide a built-in function to give you back the list of keys, values or pairs in a hash table. Instead, you're encouraged to iterate through the pairs using an idiosyncratic version of its for construct, using a specific deconstructing pattern match style to capture the sequence of key/value pairs that is used nowhere else in Racket. (Speaking of for loops, why on earth did they decide to make the parallel for loop the common behavior, and require a longer name (for*) for the more useful nested loop version?) Put simply, using hash tables in Racket is frequently awkward and filled with idiosyncracies that are hard to remember.

There are downloadable libraries that offer an assortment of other data structures, but since these libraries are made by a variety of individuals, and ported from a variety of other Scheme implementations, the interfaces for interacting with those data structures are even more inconsistent than the built-ins, which are already far from ideal.

I'm sure many programmers can live with the awkwardness of the built-in data structures to get the other cool features that Racket offers, but for me, it's a deal breaker.

ENTER CLOJURE


Clojure gets data structures right. There's a good assortment of collection types built in: lists, lazy lists, vectors, hash tables, sets, sorted hash tables, sorted sets, and queues. ALL the built-in data structures are persistent/immutable. That's right, even the *vectors* are persistent. For my work, persistent vectors are a huge asset, and now that I've experienced them in Clojure, I'm frustrated with any language that doesn't offer a similar data structure (and very few do). The consistency of working only with persistent structures is a big deal -- it means you use the exact same patterns and idioms to work with all the structures. Vectors are just as easy to work with as lists. Equality is simplified. Everything can be used as a key in a hash table.

Data structures in Clojure get a little bit of syntactic support. Not a tremendous amount, but every little bit helps. Code is a little easier to read when [1 2 3] stands out as a vector, or {:a 1, :b 2, :c 3} stands out as a hash table. Lookups are a bit more terse than in Racket -- (v 0) instead of (vector-ref v 0). Hash tables are sufficiently lightweight in Clojure that you can use them where you'd use Racket's structs defined with define-struct, and then use one consistent lookup syntax rather than type-specific accessors (e.g., (:age person) rather than (person-age person)). This gets to be more important as you deal with structures within structures, which can quickly get unwieldy in Racket, but is easy enough in Clojure using -> or get-in. Also, by representing structured data in Clojure as a hash table, you can easily create non-destructive updates of your "objects" with certain fields changed. Again, this works just as well with nested data. (Racket structs may offer immutable updates in future versions, but none of the proposals I've seen address the issue of updating nested structured data.) Furthermore, Clojure's associative update function (assoc) can handle multiple updates in one function call -- contrast (assoc h :a 1 :b 2) with (hash-set (hash-set h 'a 1) 'b 2).

Even better, the process for iterating through any of these collections is consistent. All of Clojure's collections can be treated as if they were a list, and you can write algorithms to traverse them using the same pattern of empty?/first/rest that you'd use on a list. This means that all the powerful higher-order functions like map/filter/reduce work just as well on a vector as a list. You can also create a new collection type, and hook into the built-in sequence interface, and all the built-in sequencing functions will automatically work just as well for your collection.

Although the sequencing functions work on any collection, they generally produce lazy lists, which means you can use good old recursion to solve many of the same problems you'd tackle with for/break or while/break in other languages. For example, (first (filter even? coll)) will give you the first even number in your collection (whether a list, vector, set, etc.) and it will do so in a space-efficient manner -- it doesn't need to generate an intermediate list of *all* the even numbers in your collection. Some garbage is generated along the way, but it can be garbage collected immediately and with relatively little overhead. Clojure also makes it easy to "pour" these lazy sequences into the collection of your choice via into. Racket's lack of a built-in lazy list makes it difficult to use map/filter/etc. for general processing of collections. If you use map/filter/etc., you potentially generate a lot of intermediate lists. You can use a stream library, but it was probably designed for other Scheme dialects with a naming scheme for the API that doesn't match Racket's built-in list functions or integrate well with Racket's other sequencing constructs. So often you end up writing the function you need from scratch (e.g., find-first-even-number) rather than composing existing building blocks. In some special cases, you can use one of the new for constructs, like in this case, for/first.

A polymorphic approach is applied through most of Clojure's design. assoc works on vectors, hash tables, sorted hash tables, and any other "associative" collection. And again, you can hook into this with custom collections. This is far easier to remember (and more concise to write) than the proliferation of vector-set, hash-set, etc. you'd find in Racket. It also makes the various collections more interchangeable in Clojure, making it easier to test different alternatives for performance implications with fewer, more localized changes to one's code.

Summary:

  • Clojure provides a full complement of (immutable!) data structures you need for everyday programming and a bit of syntactic support for making those manipulations more concise and pleasant.
  • All of the collections are manipulated by a small number of polymorphic functions that are easy to remember and use.
  • Traversals over all collections are uniformly accomplished by a sequence abstraction that works like a lazy list, which means that Clojure's higher order sequence functions also apply to all collections.


CLOJURE'S NOT PERFECT


The IDEs available for Clojure all have significant drawbacks. You can get work done in them, but any of the IDEs will probably be a disappointment relative to what you're used to from other languages (including Racket).

Debugging is difficult -- every error generates a ridiculously long stack trace that lists 500 Java functions along with (maybe, if you're lucky) the actual Clojure function where things went awry. Many of Clojure's core functions are written with a philosophy that they make no guarantees what they do with bad input. They might error, or they might just return some spurious answer that causes something to blow up far far away from the true origin of the problem.

Clojure inherits numerous limitations and idiosyncracies from Java. No tail-call optimization, no continuations. Methods are not true closures, and can't be passed directly to higher-order functions. Proliferation of nil and null pointer exceptions. Slow numeric performance. Compromises with the way hashing and equality works for certain things to achieve Java compatibility. Slow startup time.

Some people love Clojure specifically because it sits on top of Java and gives them access to their favorite Java libraries. Frankly, I have yet to find a Java library I'd actually want to use. Something about Java seems to turn every library into an insanely complex explosion of classes, and Java programmers mistakenly seem to think that JavaDoc-produced lists of every single class and method constitutes "good documentation". So for me, the Java interop is more of a nuisance than a help.

Clojure has a number of cool new ideas, but many of them are unproven, and only time will tell whether they are truly valuable. Some people get excited about these features, but I feel fairly neutral about them until they are more road-tested. For example:

  • Clojure's STM implementation - seems promising, but some reports suggest that under certain contention scenarios, longer transactions never complete because they keep getting preempted by shorter transactions.
  • agents - if the agent can't keep up with the requests demanded of it, the agent's "mailbox" will eventually exhaust all resources. Perhaps this approach is too brittle for real-world development?
  • vars - provides thread isolation, but interacts poorly with the whole lazy sequence paradigm that Clojure is built around.
  • multimethods - Clojure provides a multimethod system that is far simpler than, say CLOS, but it requires you to explicitly choose preferences when there are inheritance conflicts, and early reports suggest that this limits extensibility.
  • protocols - This is an interesting variation on "interfaces", but it's not clear how easy it will be to compose implementations out of partial, default implementations.
  • transients - Nice idea for speeding up single-threaded use of persistent data structures. Transients don't respond to all the same interfaces as their persistent counterparts, though, limiting their usefulness. Transients are already being rethought and are likely to be reworked into something new.


So it's hard for me to get excited about these aspects of Clojure when it remains to be seen how well these features will hold up under real-world use.

I'm sure that for many programmers, Clojure's drawbacks or unproven ideas would be a deal breaker. We all care about different things. But for me, Clojure's clean coherent design of the API for working with the built-in data structures is so good, that overall, I prefer working in Clojure to working in Racket.

18 comments:

  1. Cool post. I can answer one of your concerns about protocols. They are simply the most awesome way to compose implementations I have ever seen.

    First, take a look at the official documentation:

    http://clojure.org/protocols

    Here's one of the examples from the docs

    (extend AType
    AProtocol
    {:foo an-existing-fn
    :bar (fn [a b] ...)
    :baz (fn ([a]...) ([a b] ...)...)}
    BProtocol
    {...}
    ...)

    This is where it is important to remember that Clojure doesn't have any syntax (as a Scheme guy you "get" this). We're really building a data structure. So, the s-exp above can be defined like this.

    (def default-impl
    {:foo an-existing-fn
    :bar (fn [a b] ...)
    :baz (fn ([a]...) ([a b] ...)...)})

    (extend AType
    AProtocol
    default-impl
    BProtocol
    {...}
    ...)

    Where this gets awesome is when you want to mix and match stuff.


    (extend MyType
    AProtocol
    (merge default-impl {:foo my-override})
    BProtocol
    {...}
    ...)

    BAM! I've overridden foo with my specific implementation. More importantly, I did it using hash-map collision (spelled out specifically in merge). What this let me do is define my own inheritance rules a la carte. That's why I think Clojure Protocols are so awesome!

    ReplyDelete
  2. Regarding Clojure's STM and agents, can you provide citations for your claims?

    If agents are "too brittle for real-world development", what is it about it that makes it so, and what alternatives do not suffer from this brittleness?

    It just seems like one can select any particular concurrency approach, and find many examples of people with problems getting them to work right for their particular situation.

    Did Erlang get it right with its actor model? Did Scala get it right with its actor model? Is there any example of a concurrency system that's somehow impervious to problems for all possible workload scenarios?

    ReplyDelete
  3. Thanks Sean, for clarifying Clojure's mechanism for combining default implementations.

    I still am eager to see how this will work out in practice.

    1. As far as I know, you can only combine maps in this way within an "extend", not in the syntax used by "deftype" or "defrecord" (is this still the case?) And implementation of protocols in extend comes with a slight performance penalty versus coding it directly into the deftype. How much of a penalty is it? Will coders constantly be looking for ways to avoid using default implementations in order to code implementation of protocols directly into the type in order to get the most speed?

    2. In practice, with a large set of types and protocols, will it be easy to keep track of all the default implementations available, and understand how to safely mix and match them. The pros and cons of implementation inheritance in OO are now well understood, but it's hard to know how pleasant or unpleasant it will be to identify and then mix and match implementations by combining maps of functions. Implementation inheritance can be confusing, but it also allows you to easily express "do something custom and then do the regular (super) thing". Will the same sorts of customizations be more or less confusing with this system? What kind of forethought will be required by library designers to make sure that default implementations are easy to mix and match. For example, in OO, it's generally easy to override some aspect of an implementation whether or not the writer of the implementation intended it (provided he didn't explicitly seal/finalize the class), but in Clojure, the default style of implementing protocols within a deftype/defrecord doesn't actually create a map of functions for composing and merging. The author of the protocol must explicitly design and provide those maps -- how many will take the time to do that in practice?

    I think that all this will become clearer once Clojure's own default implementation classes (e.g., APersistentMap) are replaced with default implementations intended to be composed.

    My prediction is that the basic mechanism will prove to be a good one, but that Clojure will eventually adopt a way to express the use of default implementations directly within the non-maplike "DSL" used in deftype/defrecord.

    ReplyDelete
  4. Chepprey,

    The problem of short transactions getting continually preempted by long transactions has come up from time to time on the google group. You could start by searching on "livelock". Rich Hickey has acknowledged that this can happen, but he believe it shows up pretty quickly under load testing, and can be designed around. There have been a few scattered reports from people who have in fact had this problem in their own code, and needed help from the group to try to work around the problem. I don't have a good sense for how much the STM is getting used by folks, so it's hard to know whether these reports are isolated flukes, or a significant percentage of real-world use.

    I haven't seen people complaining of agents getting overloaded in this way but it seems clear to me that it could be a problem in real systems, especially long running systems with variable traffic patterns. Erlang's version of the actor model has at least a couple of advantages in this respect. First of all, (I believe) Erlang allows you to pattern match against certain kinds of messages, and throw the rest away. Clojure, on the other hand, allows you to send any kind of function to an agent and it will process it. This kind of openness places far more power in the hands of a client to overload the agent with computationally-intensive requests. Second, Erlang systems are built under the assumption that things (including its actors) will eventually fail over time, and it has a number of extremely robust mechanisms and patterns for dealing with faults and automatically restarting actors and coordinating with other actors who failed to acknowledge that they have processed your message. I know that Clojure recently made some modifications to the way that agents announce failure and get restarted, but I believe it's still a far cry from Erlang's fault tolerance.

    Yes, it's easy to poke holes at any existing approach to concurrency. I'm not aware of a "perfect system". My point is simply that when people point at Clojure as being the next, great thing in concurrency, I find that my own personal reaction is a more tempered, "Yes, it's got some cool ideas, but the jury's still out on whether they are really much more useful in the real world than existing approaches, or whether they just have a different and less obvious set of traps."

    ReplyDelete
  5. In support of Racket, we have come up with a library of data structures in Typed Racket (a statically typed dialect of Racket). And we are in the process of integrating it into the Racket release. Our investigation shows that the data structures are faster than the currently available data structures and has has the features that is pointed out to be missing. Heres a little bit about the work.

    http://www.ccs.neu.edu/scheme/pubs/sfp10-kth.pdf

    http://www.ccs.neu.edu/home/krhari/pfds/

    >> Racket provides no built-in lazy list/stream

    Our library has a has a stream implementation too.

    We have a hash-list implementation, based on Phil Bagwell's work, thru' which we have tried to get rid of the drawbacks of hash-tables that you have mentioned.

    Some more work needs to be done but I can say that Racket is almost there in terms of data structures.

    ReplyDelete
  6. While I feel your pain when it comes to most Java libraries, I have to say, at least we have the option of using these libraries. If Clojure was not on the JVM and didn't have Java interop, I doubt it would be anywhere near as popular as it is today, simply because it would lack libraries for very simple things.

    Nobody is forcing you to actually use those Java libraries, and nothing is forcing you to not code in straight Clojure if you so desire. I don't see how it's a nuisance simply because it's an option.

    ReplyDelete
  7. @Rayne

    popularity doesn't mean something is actually good.

    on the other hand, it probably means the bugs will get worked out.

    ReplyDelete
  8. There are a number of mentions in your post about 'reports' of things. Can you provide the sources to those? I'd love to read more about many of these problems.

    ReplyDelete
  9. @lakemalcolm

    For a sample thread about the STM, search Google Groups for "Long-running STM updates will never complete... is there a solution?"

    There were several threads about the complications that arise from needing to spell out inheritance preferences. One of the longest threads had the subject "oo" and ended with Rich saying, "Hang tight - I hear your concerns and am thinking about them."

    ReplyDelete
  10. @hari - Great news about the library you're developing; I'm looking forward to trying it out.

    ReplyDelete
  11. Puzzler,

    As a member of the Racket core team, I want you to know we're glad to read articles like this and use them to improve our platform. I've already incorporated some of your feedback into the development version.

    Thank you,

    Jay McCarthy

    ReplyDelete
  12. I feel exactly the same as you, however, my biggest beef with Clojure is that the startup time of the JVM makes it prohibitive to creating small scripts to be invoked from the command-line. Every time I have to start a JVM it feel like my Macbook freezes for a few seconds. Thanks to the REPL experience development on a VM running in the background works fine, but that's definitely a drawback.

    ReplyDelete
  13. Re: "Clojure provides a full complement of (immutable!) data structures", what is the primary advantage of the immutability? Your code's efficiency? Your code's quality?

    Would it be difficult to code your own data structure library in Racket to solve the problems identified here? Do you not undertake that because having another interpretation layer (the library) involved could complicate debugging?

    Thank you

    ReplyDelete
  14. @George
    I mostly write code that involves searches with backtracking. With immutable data structures, this is easy -- make a speculative change, continue searching and if it doesn't work out, just throw it away and go back to the earlier version. Without immutable data structures, the choice is to either:
    a) store information needed to undo changes -- this is complex to do correctly
    b) make a full copy of the mutable data structure before each change -- this is inefficient.

    So immutable data structures help with both correctness and efficiency for this kind of problem.

    I don't think it's impossible to write such a library in Racket, but the implementation in Clojure is highly optimized, and duplicating that would be a time-consuming effort.

    ReplyDelete
  15. "Some people love Clojure specifically because it sits on top of Java and gives them access to their favorite Java libraries. Frankly, I have yet to find a Java library I'd actually want to use. Something about Java seems to turn every library into an insanely complex explosion of classes, and Java programmers mistakenly seem to think that JavaDoc-produced lists of every single class and method constitutes "good documentation". So for me, the Java interop is more of a nuisance than a help."

    is exactly the same I feel!!!...I hate java..I hate the structure java code and I feel than coding with clojure is really pleasant..but when I need use a java lib the code became really ugly and not so expressive...I don't feel than jvm be a nice environment for a functional language because it has several lacks than affect to clojure...the recursion isn't so nice as haskell and I hate using partial for currying..It seem a little thing but really affect the code aspect

    ReplyDelete
  16. "I'm sure many programmers can live with the awkwardness of the built-in data structures to get the other cool features that Racket offers, but for me, it's a deal breaker."

    @mark, "cool features" is quite an understatement here, and I think it should be made clear just what Racket offers that Clojure can not.

    Clojure's data types, syntax, ie {}, [], and uniform interface is very convenient, I admit. But you're missing the point of Racket if that alone is enough to make you switch.

    In about 5 lines of Racket, you can mimic the setting/getting semantics of Clojure's maps, vectors, sets, etc, so that you turn the default:

    (hash-set h 'a (+ 1 (hash-ref h 'a))) into
    (h 'a (+ 1 (h 'a)))

    For example, put this:

    #lang racket

    (provide (except-out (all-from-out racket) #%app)
    (rename-out [app #%app]))

    (define-syntax app
    (syntax-rules ()
    [(app f a) (if (hash? f) (hash-ref f a) (f a))]
    [(app f a b) (if (hash? f) (hash-set f a b) (f a b))]
    [(app f . a) (f . a)]))

    into a file called "rackjure.rkt", and then use it like so:

    #lang s-exp "rackjure.rkt"

    (define h (hash 'a 1))

    (h 'a) ;; prints 1
    (h 'b (add1 (h 'a))) ;; prints '#hash((b . 2) (a . 1))

    In a similar manner, we could hook into Racket's readtable and allow for (map, vector, set, ...) data types to have syntax that mimics those in Clojure, ie:

    (define h1 {:a 1})
    (define h2 {:z 26})
    (define vec [h1 h2])

    Racket, more than any other language, exposes an API to the compiler that allows even novice developers to incorporate both the semantics and syntax of, say, Clojure, or Erlang, or Haskell (more Haskell), or whatever else appeals to them.

    This makes Racket more powerful than Clojure (or even Scheme). You couldn't implement Prolog and Algol 60 in Clojure, and then have their modules work seemlessly together in the same application. (links)

    To give up such power for features of Clojure that can be trivially implemented or included as libraries is throwing the baby out with the bathwater.

    I agree with everything you like about Clojure, and I agree that Racket has much "awkwardness" by comparison. But rather than choose Clojure over Racket, I prefer to regard Racket's clunkiness as the "default configuration" of a limitless language building toolkit - a toolkit I can easily change to be more Clojure-esque. In the future, I can again change Racket to mimic some new semantics the kids are raving about, long after Clojure has become passe. ;)

    For more Clojure semantics in Racket, see Greg's project Rackjure

    for more info on what distinguishes Racket macros from those of Scheme, see this paper

    Note: I realize Clojure is still a LISP, and so compared to most languages it provides programmers quite a bit of syntactic and semantic "power" in its own right. But Racket has taken that kind of thinking to the next level.

    ReplyDelete
    Replies
    1. @Scott, you make a very compelling argument for Racket. I like what rackjure has done to improve the 'syntax' of racket. The ability to turn Racket into any language you like is also attractive. I certainly miss pattern matching of Haskell and it would be great to have that.
      How extensive are the Racket libraries? Unlike Mark, I am happy to have the comfort of knowing that Java provides Clojure with an extensive set of libraries. At least the Java library could be utilised to prove concepts and, if they are found to be problematic, one could then set about rolling one's own. What is the state of Racketland?

      Delete
  17. Puzzler: I mostly write code that involves searches with backtracking.

    I'm working on a very fancy backtracking library for Racket, you could be my first tester :3

    Also I have an object/class library for Racket that I'm working on. It's main feature is that sends are about 4 times faster than the built in class system and New is about 11 times faster (the underlying mechanism is similar to c++'s). You might find the context sensitive syntax of sends is more readable than lisp's differing name per type.

    I don't know if typed Racket has better a better object system and collections, but I find the limitation of working with typed variables too annoying to care.

    JoshuaScholar -at- gmail.com

    ReplyDelete