Saturday, July 2, 2011

Realtime Hadoop usage at Facebook: The Complete Story

I had earlier blogged about why Facebook is starting to use Apache Hadoop technologies to serve realtime workloads. We presented the paper at the SIGMOD 2011 conference and it was very well received.

Here is a link to the complete paper for those who are interested in understanding the details of why we decided to use Hadoop technologies, the workloads that we have on realtime Hadoop, the enhancements that we did to Hadoop for supporting our workloads and the processes and methodologies we have adopted to deploy these workloads successfully. A shortened version of the first two sections of the paper are also described in the slides that you can find here.

12 comments:

  1. On the flip side of the coin, Cloudera is also stepping up a notch with the announcement of Enterprise 3.5, the latest version of its Hadoop management offering. Some of the new features include real-time monitoring.

    ReplyDelete
  2. Nice blog post. I'm not that much aware of this thing but then, thanks.

    ReplyDelete
  3. Thank you..its very helpfull dhruba.. actually i'm planning to work with HBase

    ReplyDelete
  4. What a wonderful world. What a good job for this post. Very rich and constructive at the same time. I want to say a thumbs up to the creator for keeping this web site simple. Congratulations finally a web site of top-level. Have a nice day!

    ReplyDelete
  5. Finally! I've been waiting for months for Facebook to begin using an efficient technology like Apache Hadoop. I've noticed a real improvement in speed since this change has taken place. Keep up the good work Hadoop!

    ReplyDelete
  6. mridula@ning.comOct 19, 2011 01:19 PM

    Hi Drubha,
    Thanks for the post and links to the white paper. Intersting read.

    Do you have any additional details you can share on muti data center replication. Do you have that in facebook today, if not what are some of the ideas on accomplishing a minimal downtime in case a data center is down.
    thanks

    ReplyDelete
  7. Thanks for such informative blog, i hope this blog is helpful for my plumber london business. I hope you like it.

    ReplyDelete
  8. @mridula, I do not yet have much details about multi-data center replication. We are working on such a product, will update you here when we get to some sane design :-)

    ReplyDelete
  9. mridula@ning.comNov 1, 2011 04:56 PM

    Thanks :)

    ReplyDelete
  10. Thanks for sharing the links.
    _________________
    Jobs In Brisbane

    ReplyDelete
  11. Hello Sir,

    Thanks for posting these files
    that was good. but I am not aware of these things.

    Can you suggest a project which is easy and done in 3 months time.

    Thankk you sir.

    ReplyDelete
  12. Is NFS irrelevant going forward? Can a linux node/cluster run on hdfs alone?

    ReplyDelete