Multiplexing work reliably and efficiently with state machines and core.async

This all started with a tweet and brief exchange with a friend and ex-colleague:

This post describes some work we did recently that I’m pretty happy with: we model the execution of independent pieces of work as state machines which are executed concurrently by multiple core.async processes with state communicated over channels. 

Modeling with state machines helps flatten call chain complexity and makes retrying/recovering from error states trivial: we just try to apply the same transition to the same state again.

In short, we've improved both throughput and reliability.

Context

Specifically our problem was:

  1. Connect to a reporting API and download a report
  2. Transform the report, converting some values between units, currencies etc.
  3. Write the report out to S3 (ultimately to be ingested into our Redshift cluster)

It’s a pretty simple problem but when we’re downloading thousands of reports every day its likely that we’ll come across intermittent problems; mostly network errors connecting to the APIs or we’ll be rate throttled etc.

We started simple, making the most of the environment our system ran in.

Our original code was similar to this:

The reporting API lets us download aggregate information for a day and client.

Ensuring processes completed was the responsibility of a supervisor process. Although this was beautifully simple for the incremental work it was incredibly inefficient when running large imports:

  1. Our unit of work was all steps needed to download, process and upload a report. If any step failed we could only retry the whole.
  2. Even worse, if we were processing hundreds or thousands of reports together any failure would terminate and prevent all subsequent reports. We could unpick progress and change some command-line options to avoid doing too much again but it's painful.
  3. Handling errors was slow and painful. If our download request was rate limited we'd have to back-off; operations were globally serial though so any delay sleeps everything.

State machines, core.async and concurrent execution

Instead of weaving function invocations together we can model the problem as a set of independent state machines- each moving independently. 

Our transitions are pretty similar to the list we mentioned at the beginning: :downloadable -> :uploadable -> :completed. Each report (a combination of client and day) will progress from :downloadable to :completed

The local order of these operations is important (we can’t upload the report before we’ve downloaded it) but the global order isn’t important- it doesn’t matter whose report we process first. It's also important to note that our operations are idempotent- it also doesn't matter if we download/try to download the same report multiple times, likewise with the upload.

At each step our code will use the machine’s :state to determine which transition to apply. If we encounter an error whilst performing a transition we attach :error to the state with the exception, letting the machine retry the operation.

Our final code looks pretty close to the following:

  1. We create the states-ch channel to communicate each state of each machine (i.e. our unit of work).
  2. Processes are started to progress each of the state machines.
  3. Once a machine’s :state is :completed the final state is put on the completed channel (this helps us know when all work is finished).
  4. States are read from the states-ch channel and the appropriate operation performed. The result of each operation is the new state. We use thread to perform the operation and return a channel we can read the result from. 
  5. If an operation causes an exception to be raised it’s caught and we associate the exception value to the state map. After a small delay we put the state map back into the states channel for the operation to be attempted again.

Modeling the problem with state machines and implementing it with core.async gives a few nice properties:

  1. Operations are transparent. It’s easy to see what’s going on at any point in time.
  2. Failure is isolated and easily retryable: all values needed to perform an operation are held in the state maps so its really just a case of applying (step state) again.
  3. We use core.async’s timeout channel to defer the retry operation, letting us switch to a different op first.
  4. Overall throughput is increased. If we need to defer an operation we proceed with something else. Even on my 4-core laptop it results in ~6x greater throughput.

In short, we’re able to process many more reports concurrently than we were able to with our initial ‘dumb’ implementation and the code, I think, is vastly more readable than the nested error handling we used to have.

Introducing Blueshift: Automated Amazon Redshift ingestion from Amazon S3

I’m very pleased to say I’ve just made public a repository for a tool we’ve built to make it easier to ingest data automatically into Amazon Redshift from Amazon S3: https://github.com/uswitch/blueshift.

Amazon Redshift is a wonderfully powerful product, if you’ve not tried it yet you should definitely take a look; I’ve written before about the value of the analytical flow it enables.

However, as nice as it is to consume data from, ingesting data is a little less fun:

  1. Forget about writing raw INSERT statements: we saw individual inserts take on the order of 5 or 6 seconds per record. Far too slow to support our usual stream let alone the flood given we reprocess the historical transaction log.
  2. Loading data from S3 is definitely the right thing to do. And, if you have a purely immutable append-only flow, this is trivial. However, if you need upsert behaviour you’ll have to write it yourself. Whilst not complicated, we have many different systems and teams pushing data into Redshift: each of these have a slightly different implementation of the same thing.
  3. We’ve seen lots of SQLException 1023 errors when we were reprocessing the historical transaction log across many different machines (our transaction log is itself already on S3 stored as lots of large Baldr files); with n machines in the cluster we’d end up with n transactions inserting into the same table.
  4. Redshift performs best when running loads operating over lots of files from S3: work is distributed to the cluster nodes and performed in parallel. Our original solution ended up with n transactions loading a smaller number of files.

Rationale

Blueshift makes life easier by taking away the client’s need to talk directly to Redshift. Instead, clients write data to S3 and Blueshift automatically manages the ingestion into Redshift.

Blueshift is a standalone service written in Clojure (a dialect of Lisp that targets the JVM) that is expected to be deployed on a server. It is configured to watch an S3 bucket; when it detects new data files and a corresponding Blueshift manifest it will perform an upsert transaction for all the files with the additional benefit of being a single import for a large number of files.

Using

You can build and run the application with Leiningen, a Clojure build tool:

The configuration file requires a minimal number of settings:

Most of it is relatively obvious but for the less obvious bits:

  • :key-pattern is a regex used to filter the directories watched within the S3 bucket; this makes it easier to have a single bucket with data from different environments.
  • :telemetry :reporters configures the Metrics reporters; a log reporter is included but we have an extra project (blueshift-riemann-metrics) for pushing data to Riemann.

Once the service is running you just need to write your delimited data files to the S3 bucket being watched and create the necessary Blueshift manifest. Your S3 layout might look something like this:

When Blueshift polls S3 it will pick up that 2 directories exist and create a watcher for each directory (directory-a/foo and directory-b in our example).

When the directory-a watcher runs it will notice there are 2 data files to be imported and a Blueshift manifest: manifest.edn: this is used to tell Blueshift where and how to import the data.

Of note in the example above:

  • :pk-columns is used to implement the merge upsert: rows in the target table (testing in the above example) that exist in our imported data are removed before the load: Redshift doesn’t enforce PK constraints so you have to delete the data first.
  • :options: can contain any of the options that Redshift’s COPY command supports.
  • :data-pattern helps Blueshift identify which files are suitable for being imported.

When the data has been imported successfully the data files are deleted from S3 leaving the Blueshift manifest ready for more data files.

Summary

We’re delighted with Amazon’s Redshift service and have released Blueshift to make it easier for people to optimally ingest data from their applications via S3 without talking to Redshift directly.

I’d love to hear from people if they find Blueshift useful and I’d be especially delighted for people to contribute (it’s currently less than 400 lines of Clojure).

Clojure - From Callbacks to Sequences

I was doing some work with a colleague earlier this week which involved connecting to an internal RabbitMQ broker and transforming some messages before forwarding them to our Kafka broker.

We’re using langohr to connect to RabbitMQ. Its consumer and queue documentation shows how to use the subscribe function to connect to a broker and print messages that arrive:

The example above is pretty close to what we started working with earlier today. It’s also quite similar to a lot of other code I’ve written in the past: connect to a broker or service and provide a block/function to be called when something interesting happens.

Sequences, not handlers

Although there’s nothing wrong with this I think there’s a nicer way: flip the responsibility so instead of the subscriber pushing to our handler function we consume it through Clojure’s sequence abstraction.

This is the approach I took when I wrote clj-kafka, a Clojure library to interact with LinkedIn’s Kafka (as an aside, Kafka is really cool- I’m planning a blog post on how we’ve been building a new data platform for uSwitch.com but it’s well worth checking out).

Here’s a little example of consuming messages through a sequence that’s taken from the clj-kafka README:

We create our consumer and access messages through a sequence abstraction by calling messages with the topic we wish to consume from.

The advantage of exposing the items through a sequence is that it becomes instantly composable with the many functions that already exist within Clojure: map, filter, remove etc.

In my experience, when writing consumption code that uses handler functions/callbacks I’ve ended up with code that looks like this:

It makes consuming data more complicated and pulls more complexity into the handler function than necessary.

Push to Pull

This is all made possible thanks to a lovely function written by Christophe Grande:

The function returns a vector containing 2 important parts: the sequence, and a function to put things into that sequence.

Returning to our original RabbitMQ example, we can change the subscriber code to use pipe to return the sequence that accesses the queue of messages:

We can then map, filter and more.

We pull responsibility out of the handler function and into the consumption of the sequence. This is really important, and it compliments something else which I’ve recently noticed myself doing more often.

In the handler function above I convert the function parameters to a map containing :payload, :ch and :msg-meta. In our actual application we’re only concerned with reading the message payload and converting it from a JSON string to a Clojure map.

Initially, we started writing something similar to this:

We have a function that exposes the messages through a sequence, but we pass a kind of transformation function as the last argument to subscriber-seq. This initially felt ok: subscriber-seq calls our handler and extracts the payload into our desired representation before putting it into the queue that backs the sequence.

But we’re pushing more responsibility into subscriber-seq than needs to be there.

We’re just extracting and transforming messages as they appear in the sequence so we can and should be building upon Clojure's existing functions: map and the like. The code below feels much better:

It feels better for a similar reason as moving the handler to a sequence- we’re making our function less complex and encouraging the composition through the many functions that already exist. Line 13 is a great example of this for me- map’ing a composite function to transform the incoming data rather than adding more work into subscriber-seq.

Pipe

I’ve probably used Christophe’s pipe function 3 or 4 times this year to take code that started with handler functions and evolved it to deal with sequences. I think it’s a really neat way of making callback-based APIs more elegant.

Multi-armed Bandit Optimisation in Clojure

The multi-armed (also often referred to as K-armed) bandit problem models the problem a gambler faces when attempting maximise the reward from playing multiple machines with varying rewards.

For example, let’s assume you are standing in front of 3 arms and that you don’t know the rate at which they will reward you. How do you set about the task of pulling the arms to maximise your cumulative reward?

It turns out there’s a fair bit of literature on this topic, and it’s also the subject of a recent O’Reilly book: “Bandit Algorithms for Website Optimization” by John Myles White (who also co-wrote the excellent Machine Learning for Hackers).

This article discusses my implementation of the algorithms which is available on GitHub and Clojars: https://github.com/pingles/clj-bandit.

Enter Clojure

I was happy to attend this year’s clojure-conj and started reading through the PDF whilst on the flight out. Over the next few evenings, afternoons and mornings (whenever I could squeeze in time) I spent some time hacking away at implementing the algorithms in Clojure. It was great fun and I pushed the results into a library: clj-bandit.

I was initially keen on implementing the same algorithms and being able to reproduce the results shown in the book. Since then I’ve spent a little time tweaking parts of the code to be a bit more functional/idiomatic. The bulk of this post covers this transition.

I started with a structure that looked like this:

From there I started layering on functions that select the arm with each algorithm’s select-arm implemented in its own namespace.

One of the simplest algorithms is Epsilon-Greedy: it applies a fixed probability when deciding whether to explore (try other arms) or exploit (pull the currently highest-rewarding arm).

The code, as implemented in clj-bandit, looks like this:

We generate a random number (between 0 and 1) and either pick the best performing or a random item from the arms.

In my initial implementation I kept algorithm functions together in a protocol, and used another protocol for storing/retrieving arm data. These were reified into an ‘algorithm’:

Applying select-arm to the current arm state would select the next arm to pull. Having pulled the arm, update-reward would let the ‘player’ track whether they were rewarded or not.

This worked, but it looked a little kludgey and made the corresponding monte-carlo simulation code equivalently disgusting.

I initially wanted to implement all the algorithms so I could reproduce the same results that were included in the book but the resulting code definitely didn’t feel right.

More Functional

After returning from the conference I started looking at the code and started moving a few functions around. I dropped the protocols and went back to the original datastructure to hold the algorithm’s current view of the world.

I decided to change my approach slightly and introduced a record to hold data about the arm.

The algorithms don’t need to know about the identity of any individual arm- they just need to pick one from the set. It tidied a lot of the code in the algorithms. For example, here’s the select-arm code from the UCB algorithm:

Functional Simulation

The cool part about the book and it’s accompanying code is that includes a simulation suitable for measuring the performance and behaviour of each of the algorithms.

This is important because the bandit algorithms have a complex feedback cycle: their behaviour is constantly changing in the light of data given to them during their lifetime.

For example, following from the code by John Myles White in his book, we can visualise the algorithm’s performance over time. One measure is accuracy (that is, how likely is the algorithm to pick the highest paying arm at a given iteration) and we can see the performance across algorithms over time, and according to their exploration/exploitation parameters, in the plot below:

The simulation works by using a series of simulated bandit arms. These will reward randomly according to a specified probability:

We can model the problem neatly by creating a function representing the arm, we can then pull the arm by applying the function.

As I mentioned earlier, when the code included protocols for algorithms and storage, the simulation code ended up being pretty messy. After I’d dropped those everything felt a little cleaner and more Clojure-y. This felt more apparent when it came to rewriting the simulation harness.

clj-bandit.simulate has all the code, but the key part was the introduction of 2 functions that performed the simulation:

simulation-seq creates a sequence through iterate’ing the simulate function. simulate is passed a map that contains the current state of the algorithm (the performance of the arms), it returns the updated state (based on the pull during that iteration) and tracks the cumulative reward. Given we’re most interested in the performance of the algorithm we can then just (map :result ...) across the sequence. Much nicer than nested doseq’s!

Further Work

At uSwitch we’re interested in experimenting with multi-armed bandit algorithms. We can use simulation to estimate performance using already observed data. But, we’d also need to do a little work to consume these algorithms into web applications.

There are existing Clojure libraries for embedding optimisations into your Ring application:

  1. Touchstone
  2. Bestcase

These both provide implementations of the algorithms and for storing their state in stores like Redis.

I chose to work through the book because I was interested in learning more about the algorithms but I also like the idea of keeping the algorithms and application concerns separate.

Because of that I’m keen to work on a separate library that makes consuming the algorithms from clj-bandit into a Ring application easier. I’m hoping that over the coming holiday season I’ll get a chance to spend a few more hours working on it.

Protocol Buffers with Clojure and Leiningen

This week I’ve been prototyping some data processing tools that will work across the platforms we use (Ruby, Clojure, .NET). Having not tried Protocol Buffers before I thought I’d spike it out and see how it might fit.

Protocol Buffers

The Google page obviously has a lot more detail but for anyone who’s not seen them: you define your messages in an intermediate language before compiling into your target language.

There’s a Ruby library that makes it trivially easy to generate Ruby code so you can create messages as follows:

Clojure and Leiningen

The next step was to see how these messages would interact with Clojure and Java. Fortunately, there’s already a few options and I tried out clojure-protobuf which conveniently includes a Leiningen task for running both the Protocol Buffer compiler protoc and javac.

I added the dependency to my project.clj:

[protobuf "0.6.0-beta2"]

At the time, the protobuf library expected your .proto files to be placed in a ./proto directory under your project root. I forked to add a :proto-path so that I could pull in the files from a git submodule.

Assuming you have a proto file or two in your proto source directory, you should be able to invoke the compiler by running

$ lein protobuf compile
Compiling person.proto to /Users/paul/Work/forward/data-spike/protosrc
Compiling 1 source files to /Users/paul/Work/forward/data-spike/classes

You should now see some Java .class files in your ./classes directory.

Using clojure-protobuf to load an object from a byte array looks as follows:

Uberjar Time

I ran into a little trouble when I came to build the command-line tool and deploy it. When building with lein uberjar it seemed that the ./classes directory was being cleaned causing the protobuf compiled Java classes to be unavailable to the application (causing the rest of the application to fail to build- I was using tools.cli with a main fn which meant using :gen-class).

I always turn to Leiningen’s sample project.clj and saw :clean-non-project-classes. The comment mentioned it was set to false by default so that wasn’t it.

It turns out that Leiningen’s uberjar task checks a different option when determining whether to clean the project before executing: :disable-implicit-clean. I added :disable-implicit-clean true to our project.clj and all was good:

$ lein protobuf compile, uberjar

I wasn’t a registered user of the Leiningen mailing list (and am waiting for my question to be moderated) but it feels like uberjar should honour :clean-non-project-class too. I’d love to submit a patch to earn myself a sticker :)

Introducing clj-esper: simple Esper wrapper for Clojure

clj-esper is a very-early-stages Clojure library to wrap Esper, a complex event processing engine.

I’ve been working on building a simple systems monitoring tool over the past few weeks. The aim of this system is to capture events from various streams, provide a visualisation of how things are ‘now’ and (going forward) provide more intelligent error reporting. We’ve been using it to track success/error request rates and will be looking to fold in more event streams from inside our systems.

There’s a little info in the project’s README.md and some more in the tests, but here’s a high-level example from our monitoring app.

Esper events are modeled in clj-esper as regular maps (albeit with some metadata), and can be created with the defevent macro. The new-event function constructs a map with all fields defined (and with their values coerced- although these mappings are a little incomplete currently, they do enough for me :).

I had originally started an implementation based around defrecord and building classes but this started to become increasingly more complex so this seemed like a reasonable compromise. It’s probably not the cleanest but works ok for now.

TODO

I’m writing this (and publishing the code) because I’d really like to do a couple of things:

  • Esper allows events to contain other maps and complex types. It’d be great to add support for this.
  • Try an alternate modeling of the output event stream as a Clojure sequence (rather than the current callback function).

I’m sure there’s also a fair bit of the code that could be tidied and improved. I’d welcome any forks and pull requests of the clj-esper GitHub repository.

Hiring

We (Forward) are hiring great developers; if you’re interested then please feel free to check out our jobs page or ping me an email for more info.

Clojure, Abstraction and ZeroMQ

For me, the stand-out talk when I attended clojure-conj last year was Mark McGranaghan’s talk on Ring: “One Ring to Bind Them”. I’ve been working with ZeroMQ and Clojure these past few weeks and it’s been all the better because of Mark’s talk.

Core Abstractions

Mark’s talk deftly highlights the beauty of some of Clojure’s core abstractions, sequences and functions, and how Ring modeled it’s world around them. The following quote is from Alan Perlis and often used when talking about Clojure:

It’s better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures

Building upon Clojure’s core abstractions makes it easy to compose new functionality from existing libraries. For example, Ring uses the concept of middleware- optional building blocks that decorate your application with additional behaviour. Because the core request handler is just a function, it’s possible to re-use Clojure’s threading macro to build up more complex behaviour:

The main-routes Var is built using Compojure, this is then wrapped by middleware to handle logging, exception handling and nested query parameters.

Of course, many other languages and libraries solve the problem in a similar way- the Java web frameworks I used to work with would most likely wrap this kind of stuff in via. an AOP framework that’s wired in with some IoC container like Spring.

What makes Clojure different, apart from not needing any more frameworks, is that it’s using such a core, language-level abstraction: there’s no need to write middleware adapters to join disparate things together.

ZeroMQ

I mentioned that it had been my working with ZeroMQ that reminded me of all of this. I’m working on wiring Esper into our production environment so that we can monitor the behaviour of our system better.

Much like Luca described in his talk, Node.js and ZeroMQ, I’m using Node.js and ZeroMQ as a kind of glue- tailing logs and other bits of information and re-publishing. This is then be aggregated and consumed in one place.

Core Abstractions

There are two things we need to do with our ZeroMQ sockets:

  • Consume data (from an ‘infinite’ stream)
  • Publish data

Of course, it’s trivialised a little, but it’s largely all we need to do.

For consuming data in Clojure, my first cut was fairly traditional- start a (while true) infinite loop that consumed messages and called out to a handler function. It looked a little like this:

It worked, but it’s pretty ugly and certainly not simple. Given it’s just calling a handler it’s also impossible to connect this to much around the rest of Clojure’s libraries without some work. Time to re-think.

Going back to our original statement of what we need to do with our sockets it’s possible to see there are some basic building blocks we could use:

Utility Clojure Type
Consume data (from an ‘infinite’ stream) Sequence
Publish data Function

Our compound consumer above can be re-written and turned into a sequence with repeatedly as follows:

Note how much cleaner the consumer code now is, and how we can re-use any of Clojure’s sequence functions to operate on the stream.

For our publisher we can write a function which returns a function:

The generated function closes over the initialised socket so that any consumer that wants to publish data can do so by just calling the function passing the bytes to be sent. It provides a clean interface and has the added advantage we can use it with many higher-order functions.

For me, this has really highlighted the power of Clojure over other JVM based languages- clean, simple, core abstractions that can be composed in simple, powerful ways.

Incidentally, I’m working on tidying up the code that interacts with both ZeroMQ and Esper. When they’re a little tidier I’ll open-source them and publish on my GitHub account, and most likely post an article or two here too. Stay tuned!

Introducing Clojure to an Organisation

A thread came up on the london-clojurians mailing list about people using Clojure in London. I replied about our use at Forward and below are the answers I gave to some questions Bruce asked.

How did we introduce Clojure?

I guess much like anything we use, we didn’t really introduce it. We experimented with it, used it to build some internal tools and systems. These would be rolled into production and we’d start getting a feel for where it might/might not be useful.

We’ve favoured building systems in small, independent pieces which makes swapping out an implementation easier, and encourages us to learn and re-learn everything about what that piece did. We’d take the knowledge but feel confident about dropping any/all of the code that ran before.

I used it originally at the beginning of last year for doing some work with Google’s APIs and to build a system to run the ETL flow for one of our businesses. A part of that runs on Hadoop (we have some tens-of-gigabyte XML files) and is still used today, although we dropped the flow system for a JRuby implementation.

More recently, we’ve been working at uSwitch to replace a monolithic .NET system (it’s actually lots of WCF services, but feels like you can’t manoeuvre easily :) with a mix of Ruby and Clojure. A colleague, Mike Jones, was on paternity leave and felt Clojure might be a good fit for the core pricing/comparison of the Utilities. As we went we uncovered all kinds of nuances and things which had been missed in the original .NET, and our Ruby implementation.

How did we get other people involved?

Anyone that worked on the team would need to do some work on the code at some point. Mike has kind of been the lead for it, but, all of us have been working with it.

We place a big emphasis on learning. We started studying SICP and organising weekly meetings to go through examples. Others started working through the Project Euler questions and meeting to go through those.

What have been the biggest problems?

I think it’s been getting the functional-feel/Clojure-feel. After a while you get a feel for what’s good/bad with OO languages- forced out through many years of TDDing and applying OO principles to older systems. I don’t think any of us had experience with a functional language, so you have to rely on more base things to know whether it’s going well. I’d say more recently, I take a lot of inspiration from Stuart Halloway’s emphasis on Simple over Compound to know where a solution lies.

I’d say it’s taken us a long time and lots of extra learning to get to being average Clojure programmers. But, I definitely think it’s made us significantly better programmers overall (a side-effect of learning SICP and some other old-school CompSci books has been making better decisions). It sets the programmer bar pretty high.

What have been the biggest successes?

I actually think one of Clojure’s best strengths is it’s Java Interop. If I’m looking at using some Java library, I’ll almost always ‘lein new’ a new project and play around. I find the flow in Emacs/SLIME with Clojure very productive now. I also think the interop code you write ends up being cleaner than with other JVM languages (think JRuby).

The bit of code I wrote at the beginning of last year to do some of our big data ETL is still being used (although given it’s triviality it’s more to do with it just working, than a specific Clojure #win :). The utilities pricing has probably been in production for around 9 months and runs really well. We hired Antonio Garrote, who slated our Clojure code during his interview :), and he’s been using Incanter to help analyse and visualise some stuff he’s been doing with Mahout.

Aside from anything Clojure specific, I think the biggest success has been the emphasis it’s placed on us to be more thoughtful and analytical programmers.

Any surprises?

This isn’t really related to Clojure per se, but, Mike and I both attended clojure-conj last year. During the flight we paired for about 7 hours and rewrote whole parts of the utilities code. Protocols had just been introduced to Clojure and we thought it might be possible to write a tidier implementation. We added them and then deleted them, our implementation didn’t really work out very well. But, what was surprising was how being locked away without the Internet was still productive. We felt it was one of the most productive 7 hours work we’ve ever had.

Dynamic error capture with Clojure

I’ve recently been trying to improve parts of uSwitch.com’s Clojure pricing code for the last few days. Mostly this has been to add error handling that provides a form of graceful degradation.

The majority of the application deals with pricing the various consumer Energy plans that people can pick from. Although we try and make sure everything validates before it hits the site we’ve found a few weird problems that weren’t anticipated. This can be bad when errors prevent the other results from being returned successfully.

Ideally, we’d like to be able to capture both the errors and the plans we couldn’t price; clients can use the data we have and know what’s missing.

I’m sure I had read an interview in which Rich Hickey said how Clojure’s dynamism helped deal with error handling in a clean and safe way. I can’t find a reference (so it’s possible I made that up), but, I then was looking through The Joy of Clojure and found this:

“there are two directions for handling errors. The first … refers to the passive handling of exceptions bubbling outward from inner functions. But built on Clojure’s dynamic Var binding is a more active mode of error handling, where handlers are pushed into inner functions.”

The code below shows a trivialised version of the problem- a calculation function that might raise an error part-way through the collection.

Instead, we can use binding and an error handling function to dynamically handle the problem and push this down the stack.

Note that we’re using declare to define our error-handler Var, and, that we have to mark it as :dynamic as we’re going to be dynamically binding it.

The next stage is to progress from just returning nil and to capture the errors whilst the records are being processed. We can use Atoms to hold the state and then append more information as we flow through.

The above example introduces an errors atom that will capture the responses as we map across the sequence. But it definitely now feels a little icky.

  1. We use do to both add the error to our list of errors, and return nil.
  2. Because we’ve now introduced side-effects we must use doall to realise the sequence whilst the handle-error function is rebound.

I have to say, I’m not too sure whether I prefer this more dynamic approach or a more traditional try...catch form.

The dynamic behaviour is definitely very cool, and the authors of The Joy of Clojure say that “using dynamic scope via binding is the preferred way to handle recoverable errors in a context-sensitive manner.” so I’m very inclined to stick with it. However, I think I’ve managed to make it less nice: what originally looked like a neat use of closure to capture errors in a safe way now seems less good? I’d love to hear what more experienced Clojurists make of it.

Other references:

  1. http://kotka.de/blog/2010/05/Didyouknow_IV.html
  2. http://kotka.de/blog/2009/11/TamingtheBound_Seq.html
  3. http://cemerick.com/2009/11/03/be-mindful-of-clojures-binding/

Using partial to tidy Clojure tests

I was reading through getwoven’s infer code on GitHub and noticed a nice use of partial to tidy tests.

Frequently there’ll be a function under test that will be provided multiple inputs with perhaps only a single parameter value changing per example. Using partial you can build a function that closes over it’s variable arguments letting you tidy up the call to the single (varying) parameter.

The example I found was in infer’s sparse_classification_test.clj but I’ve written a quick gist to demonstrate the point: