Tuesday, July 28, 2009

Hadoop and Condor

My graduate work in the mid-nineties at the University of Wisconsin focussed on Condor. Condor has an amazing way to do process checkpointing and migrating processes from one machine to another if needed. It also has a very powerful scheduler that matches job requirements with machine characteristics.

One of the major inefficiencies with Hadoop schedulers (Fairshare and Capacity scheduler) is that they are not resource-aware. There has been some work-in-progress in this area, HADOOP-5881. Condor's ClassAds mechanism can be used to match hadoop jobs with machines very elegantly.

Here is one of my recent presentation at the The Israeli Association of Grid Technologies that talks about the synergies between Condor and Hadoop.