Tuesday, July 28, 2009

Hadoop and Condor

My graduate work in the mid-nineties at the University of Wisconsin focussed on Condor. Condor has an amazing way to do process checkpointing and migrating processes from one machine to another if needed. It also has a very powerful scheduler that matches job requirements with machine characteristics.

One of the major inefficiencies with Hadoop schedulers (Fairshare and Capacity scheduler) is that they are not resource-aware. There has been some work-in-progress in this area, HADOOP-5881. Condor's ClassAds mechanism can be used to match hadoop jobs with machines very elegantly.

Here is one of my recent presentation at the The Israeli Association of Grid Technologies that talks about the synergies between Condor and Hadoop.


  1. Hello Dhruba,

    first, thanks for publishing your blog. It's a great read for people searching a way getting into Hadoop for huge environments.

    A student project I am currently working for is taking a deeper look into Condor and Hadoop. I think we're looking in the same direction as you were in this post. But it's really hard to find anything about it on other blogs/sites.

    Do you at Facebook still use Condor to deploy jobs? Or did you solve your problems with Hadoop itself?
    Would be great if you could help me out here and maybe gimme some links/status updates. :)

    Thanks in advance,

  2. hi martin, we do not use Condor at Facebook.

  3. Ah ok. Thanks for your quick answer. :)
    I thought you might still use condor, since there was a Condor tag in your "Facebook has the world's largest Hadoop cluster!" post. ;)

    Are you using any other applications/frameworks to better match jobs to
    certain machines or are you handling all the job-distribution with Hadoop?

  4. we are just using hadoop for job matching and scheduling...

  5. Is that enough for you? Don't you want any mechanism to better distribute the jobs depending on job-size and machine-size?
    Or are your cluster PCs all pretty much the same level?

  6. We use the fair-share scheduler in hadoop. That uses memory and cpu loads to determine which machine to schedule on. But obviously, it can be improved.