Archive for

January 2012

Protocol Buffers with Clojure and Leiningen

This week I’ve been prototyping some data processing tools that will work across the platforms we use (Ruby, Clojure, .NET). Having not tried Protocol Buffers before I thought I’d spike it out and see how it might fit.

Protocol Buffers

The Google page obviously has a lot more detail but for anyone who’s not seen them: you define your messages in an intermediate language before compiling into your target language.

There’s a Ruby library that makes it trivially easy to generate Ruby code so you can create messages as follows:

Clojure and Leiningen

The next step was to see how these messages would interact with Clojure and Java. Fortunately, there’s already a few options and I tried out clojure-protobuf which conveniently includes a Leiningen task for running both the Protocol Buffer compiler protoc and javac.

I added the dependency to my project.clj:

[protobuf "0.6.0-beta2"]

At the time, the protobuf library expected your .proto files to be placed in a ./proto directory under your project root. I forked to add a :proto-path so that I could pull in the files from a git submodule.

Assuming you have a proto file or two in your proto source directory, you should be able to invoke the compiler by running

$ lein protobuf compile
Compiling person.proto to /Users/paul/Work/forward/data-spike/protosrc
Compiling 1 source files to /Users/paul/Work/forward/data-spike/classes

You should now see some Java .class files in your ./classes directory.

Using clojure-protobuf to load an object from a byte array looks as follows:

Uberjar Time

I ran into a little trouble when I came to build the command-line tool and deploy it. When building with lein uberjar it seemed that the ./classes directory was being cleaned causing the protobuf compiled Java classes to be unavailable to the application (causing the rest of the application to fail to build- I was using tools.cli with a main fn which meant using :gen-class).

I always turn to Leiningen’s sample project.clj and saw :clean-non-project-classes. The comment mentioned it was set to false by default so that wasn’t it.

It turns out that Leiningen’s uberjar task checks a different option when determining whether to clean the project before executing: :disable-implicit-clean. I added :disable-implicit-clean true to our project.clj and all was good:

$ lein protobuf compile, uberjar

I wasn’t a registered user of the Leiningen mailing list (and am waiting for my question to be moderated) but it feels like uberjar should honour :clean-non-project-class too. I’d love to submit a patch to earn myself a sticker :)

Filed under  //  clojure   leiningen   protocolbuffers   ruby  
Posted

Social Enterprise Development

When I read the transcript of Linus Torvald’s talk on Git at Google I was working at an investment bank in London and it was about 4 years ago. It was just as I’d started using GitHub for hosting my own side-projects and for doing some open-source work. Fast forward to today and I’ve just read an article about the fast rise of GitHub as the software repository of choice for open-source development and an interesting space for Enterprise hosting.

All the banks I worked in were extremely centrally controlled: you’d use approved libraries and tools only. However, the way that the different teams interacted seemed very close to the open-source model espoused by Linux. I think there’s a very strong benefit such centralised organisations could gain through adopting a slightly more bazaar approach.

Teams and Structure

It was probably the second or third bank I’d worked in and I was struck by the way that teams at the bank were structured (as compared to the other types of organisations I’d worked at). One set of developers would maintain the front-end of the trading system, another would work on the back-office services that would process the trades, and another would provide the quantitative libraries used for pricing them.

The work each team did was quite different, requiring different types of skill and with varying levels of change etc.

The quantitative libraries would need to be updated as the bank started changing the way it modeled the trades they performed and would frequently receive performance improvements. The trading application would receive the occasional UI tweak or new feature to allow traders to enter new kinds of trade, or provide quicker ways for them to do it.

The front-end application team would frequently need to incorporate quantitative library changes as they released new versions (at least once a week). This would require the team to run tests to ensure that the newly integrated pricing library would behave properly: producing identical trades that priced the same.

Often changes within the pricing library would break the application through throwing errors or producing trades with corrupted numbers. The front-end team would then have to step through the pricing integration code to figure out where it went wrong.

Of course, the pricing libraries were kept close to the core; trying to figure out why something had changed would require the front-end teams to reverse-engineer a little about the quantitative library. The quant team would often be too busy to help much with identifying the cause of the problems. And, after all, the problems were just as likely to be caused by an error in the integration code.

Social Models

What struck me at the time was how close the social behaviour of the teams was to open-source development and how distributed source-control (like Git) and social software (like GitHub) would be a relatively natural extension.

Going to the project/product/technical lead would almost certainly result in your request being queued into a long backlog. Instead, you would speak directly to another developer you were friendly with and they’d help point you in the right direction or confirm there was a problem.

Most front-end developers were capable of stepping through the pricing code and identifying what was causing their problem. They may not be the best people to be maintaining pricing code on a day-to-day basis, but they can certainly diagnose and fix most problems they’d uncover.

But, because the quant team were locked away, making changes would be reliant on entering into the backlog, or relying on a somewhat rogue (but friendly) quant.

Distributed Enterprise

The models of distributed source control and open-source application development seem natural enterprise fits: teams focus on their libraries with the development co-ordinated through their regular teams in largely the same way it is currently.

The difference is that code repositories could be more easily shared between teams. Nobody outside the quant team pushes to the central repository directly. Instead, they fork the pricing library to do their debugging and analysis, hopefully find the problem, create a branch to fix the issue and submit a pull request. They speak to their go-to person on the team and talk things through. They either pull directly in, or can use it as a starting point for integrating changes.

I remember talking this through with one of the front-end application developers at the time. It seemed like an obvious (albeit bold) thing to try.

Filed under  //  distributedsourcecontrol   enterprisedevelopment   git   github  
Posted
Fork me on GitHub