Introducing node-hdfs: node.js client for Hadoop HDFS

I’m very happy to announce a very early cut of a node.js library for accessing Hadoop’s filesystem: node-hdfs. This is down to a lot of work from a colleague of mine: Horaci Cuevas.

A few months ago I was tinkering with the idea of building a Syslog to HDFS bridge: I wanted an easy way to forward web log (and other interesting data) straight out to HDFS. Given I’d not done much with node.js I thought it might be a fun exercise.

During about a week of very late nights and early mornings I followed CloudKick’s example to wrap Hadoop’s libhdfs and got as far as it reading and writing files. Horaci has picked the ball up and run far and wide with it.

After you’ve run node-waf configure && node-waf build you can write directly to HDFS:

There’s some more information in the project’s README.

Once again, massive thanks to Horaci for putting so much into the library; forks and patches most certainly welcome, I’m pretty sure the v8 C++ I wrote is wrong somehow!