Sunday, May 9, 2010

Facebook has the world's largest Hadoop cluster!

It is not a secret anymore!

The Datawarehouse Hadoop cluster at Facebook has become the largest known Hadoop storage cluster in the world. Here are some of the details about this single HDFS cluster:
  • 21 PB of storage in a single HDFS cluster
  • 2000 machines
  • 12 TB per machine (a few machines have 24 TB each)
  • 1200 machines with 8 cores each + 800 machines with 16 cores each
  • 32 GB of RAM per machine
  • 15 map-reduce tasks per machine
That's a total of more than 21 PB of configured storage capacity! This is larger than the previously known Yahoo!'s cluster of 14 PB. Here are the cluster statistics from the HDFS cluster at Facebook:












Hadoop started at Yahoo! and full marks to Yahoo! for developing such critical infrastructure technology in the open. I started working with Hadoop when I joined Yahoo! in 2006. Hadoop was in its infancy at that time and I was fortunate to be part of the core set of Hadoop engineers at Yahoo!. Many thanks to Doug Cutting for creating Hadoop and Eric14 for convincing the executing management at Yahoo! to develop Hadoop as open source software.

Facebook engineers work closely with the Hadoop engineering team at Yahoo! to push Hadoop to greater scalability and performance. Facebook has many Hadoop clusters, the largest among them is the one that is used for Datawarehousing. Here are some statistics that describe a few characteristics of the Facebook's Datawarehousing Hadoop cluster:
  • 12 TB of compressed data added per day
  • 800 TB of compressed data scanned per day
  • 25,000 map-reduce jobs per day
  • 65 millions files in HDFS
  • 30,000 simultaneous clients to the HDFS NameNode
A majority of this data arrives via scribe, as desribed in scribe-hdfs integration. This data is loaded in Hive. Hive provides a very elegant way to query the data stored in Hadoop. Almost 99.9% Hadoop jobs at Facebook are generated by a Hive front-end system. We provide lots more details about our scale of operations in our paper at SIGMOD titled Datawarehousing and Analytics Infrastructure at Facebook.

Here are two pictorial representations of the rate of growth of the Hadoop cluster:



Details about our Hadoop configuration

I have fielded many questions from developers and system administrators about the Hadoop configuration that is deployed in the Facebook Hadoop Datawarehouse. Some of these questions are from Linux kernel developers who would like to make Linux swapping work better with Hadoop workload; other questions are from JVM developers who may attempt to make Hadoop run faster for processes with large heap size; yet others are from GPU architects who would like to port a Hadoop workload to run on GPUs. To enable this type of outside research, here are the details about the Facebook's Hadoop warehouse configurations. I hope this open sharing of infrastructure details from Facebook jumpstarts the research community to design ways and means to optimize systems for Hadoop usage.



53 comments:

  1. what's the carbon footprint/power consumption? mind-boggling..

    ReplyDelete
  2. That SIGMOD link is broken. Here it is - link

    ReplyDelete
  3. Wow....amazing...!!!

    ReplyDelete
  4. Where's the like button on this thing, Dhruba?

    ReplyDelete
  5. what's the carbon footprint/power consumption? mind-boggling..

    ReplyDelete
  6. Hadoop, the rainforest killer

    ReplyDelete
  7. people who are concerned about carbon foot print here is my answer, the scenario would have been worse, the number of servers needing to serve such huge task is humengous and hadoop optimizies the resources.

    ReplyDelete
  8. Are you sure that cluster is bigger than the newer 4k machine clusters at Yahoo? I seem to recall they had a couple bigger than this....

    ReplyDelete
  9. @funjon, from what I hear, all of the 4 K nodes in the Yahoo's cluster have 4 TB of disk each. http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html

    ReplyDelete
  10. That's great! This is how technology works in its finest. That is really amazing!

    ReplyDelete
  11. With disks failing (possibly resulting in node shutdown) and rebuild/recovery that needs to be done, can you let me know how many people it would take to manage the a cluster of the size that FB has?

    ReplyDelete
  12. @Naren: we have one admin person who manages the hdfs cluster. He is a person responsible for deploying new software, monitoring health, reporting and categorization of issues that arise as part of operations, etc.etc. Then maybe another virtual person(s) who spends a few hours every week to gather all failed machines/disks and send them to a repair facility.

    ReplyDelete
  13. Nice article...

    hadoophelp.blogspot.com

    ReplyDelete
  14. A few things that interest me about your configuration files (many thanks for posting!)

    1. You don't use LZO compression, but rather Gzip.

    2. With 12TB/24TB, I'm assuming 12 spindles. Mapper contention on spindles usually creates problems with one DataNode handling > 8 spindles.

    3. With 16 cores, only having 15 slots (9 map, 5 reduce) seems low. And 1GB per task means only using 15GB out of the 32GB on the box.

    Thanks for any feedback on the above,

    -- Ken

    ReplyDelete
  15. This comment has been removed by a blog administrator.

    ReplyDelete
  16. 1. we use LZO for map outputs (less CPU) but use GZIP for reduce outputs (lesser disk space).

    2. we have 12 spindles.

    3. our map or reduce computations are very CPU heavy and the cluster is bottlenecked on CPU (rather than IOPs). The 1 GB per task is just the default. Most jobs (via Hive) are allowed to set their own JVM heap size.

    ReplyDelete
  17. hi I have 4 machines Suse-Linux11 , I need to set up a 4 node hadoop cluster I have RAM 16GB [16 cores] per machines.
    I need to know how may maps and reduces should I configure? Also Can I have multiple clusters on same 4 machines by just changing the port numbers and other directories and running hadoop with separate user.?

    ReplyDelete
  18. Yes, u can run multiple hdfs clusters on the same set of machines (as long as they use different ports)

    ReplyDelete
  19. Dhruba, what do you do for Realtime analytics? do you use something like Flume? or you have your own ?

    ReplyDelete
  20. for realtime analytics, we use HBase. http://hadoopblog.blogspot.com/2011/07/realtime-hadoop-usage-at-facebook.html

    ReplyDelete
  21. Do you guys use puppet, chef or custom scripts to configure and keep up to date the machines?

    ReplyDelete
  22. What is your backup plan for the Hadoop cluster? does backup of hadoop cluster makes sense for you? if so do you quiesce the hive before backup? and how is new/modified data detected (as the data sizes are so huge)?

    ReplyDelete
  23. Hadoop did NOT start at Yahoo. It was born out of the Apache Nutch project.

    ReplyDelete
  24. Two questions:

    1) What is the required versus achieving IOPS & Latency out of each nodes storage subsystem? Asked another way... what were you aiming for and what did you actually get in terms of performance?

    2) How does the failure of -- for example -- 10 nodes affect the cluster?

    ReplyDelete
  25. @toni: we use custom scripts to configure and deploy software on hadoop machines.

    @The Hive cluster is a pure warehouse. That means that if you backup the 20+ TB of new data that comes in every day, all other data can b derived from that stream. So, we have processes to replicate data ascross data centers and as long as we can copy the source data to multiple data centers, we have a good story on backup (including DR).

    @Jeff: we focussed on job-pipleline latencies. That means a certain pipeline (bunch of hive jobs) have to finish within a certain time. Regarding ur other question: we have had cases when a rack fails. A rack has 20 machines. When this happens, we see that HDFS re-replicates the data and this re-replication finishes in about an hour, i.e. our mean-time-to-recover from a failed rack is about 1 hour. However, jobs continue to run normally during this period.

    ReplyDelete
    Replies
    1. Dhrub,
      Your comment >> A rack has 20 machines. When this happens, we see that HDFS re-replicates the data and this re-replication finishes in about an hour, i.e. our mean-time-to-recover from a failed rack is about 1 hour.

      Unlike facebook, where you have 2000 machine (with 20 m/c per rack,so I am assuming you have 100 racks), the re-replication takes about an hour. For relatively small clusters - say 60 nodes (i.e. 3 racks with 20 nodes each) when a rack fails the re-replication can overwhelm Top-of-rack switch and the re-replication duration can be larger. Does the re-replication rate-limited ? Any suggestions and/or possible performance numbers for recovery time for such failures?

      Delete
    2. The re-replication rate is not limited (AFAIK).

      Delete
  26. Ya know, I think I have an idea that would reduce all the hardware requirements down to a fraction of the thousands of servers currently employed.

    It's a purely analytic solution, but it would work and would be very scalable, especially with the larger sets of data.

    ReplyDelete
  27. Great helpful information. Thanks for providing wonderful stats of hadoop usage at FB.
    Hadoop can be used for olap as well as OLTP.
    Please click why hadoop is introduced

    ReplyDelete
  28. @Dhruba : Thanks for (all) the post(s). Can you give us updated figures about the cluster size at the begining of 2012 ? Is the growth still amazing ?

    ReplyDelete
  29. Great post....

    worldofhadoop.blogspot.com

    ReplyDelete
  30. We found interesting link for the Hadoop developer

    60 Hadoop Interview Question
    http://www.pappupass.com/Hadoop_Interview_Question.pdf

    follow link for Hadoop Exam Simulator
    http://www.pappupass.com/class/index.php/hadoop/hadoop-exam-simulator

    ReplyDelete
  31. Hello,
    what a amazing news is this! The Datawarehouse Hadoop cluster at Facebook has become the largest known Hadoop storage cluster in the world is really a excellent information.I love it.Thanks a lot
    Used Pallet Racks




    ReplyDelete
  32. I think yahoo has around 42000 nodes in their cluster and LinkedIn has around some 4000 nodes. May be FB has large data in it. But when it comes to the number of data nodes it will be yahoo I guess...

    ReplyDelete
  33. @Pradeep: The 42000 nodes number from Yahoo is the total number of nodes in all the hdfs clusters in production at Y!.. and not from a single cluster.

    ReplyDelete
  34. Hey this shows the scope of HADOOP.
    What do you think programmers?
    Its time to learn Hadoop online.
    I am looking for online hadoop live tutorial means online course.
    and one of my friend suggested me WIZIQ for online learning having course id 21308.
    and they are giving free demo for any course.
    So what do u think?
    I am thinking to take such course and make myself scope in such field.
    Wanna learn HADOOP then do check once WIZIQ.
    Thank You.

    ReplyDelete
  35. Very nice and informative blog.

    @Shruti: Ya Hadoop has great scope now a days.
    As you can get an idea from this blog too.
    And I took this course from WIZIQ and now I am doing job as HADOOP developer.
    wanna tell you that this course is awsme as tutor is cloudera certified and he knows where we lag and where we make mistakes.
    Thanks to him as I got a job only because of that tutor.
    And yes WIZIQ is very supportive and Very responsive.
    Just close your eyes and click enroll button.
    :)
    Hope it will be helpful for you.

    ReplyDelete
  36. I think the things you covered through the post are quite impressive, good job and great efforts. I found it very interesting and enjoyed reading all of it... keeps it up, good job.

    ReplyDelete
  37. your blog is very nice.Hadoop is very important for any organization, So hadoop training is must to improve yourself business.
    thanks for the tips.hadoop online tutorial

    ReplyDelete
  38. Great helpful information. Thanks for providing wonderful hadoop information.123trainings provides hadoop online training we can see free demo class
    hadoop online training classes in hyderabad.

    ReplyDelete
  39. It's amazing and this information is very very useful for us.123trainings also provides hadoop online traning
    to see free demohadoop online training classes in india

    ReplyDelete


  40. It's amazing and this information is very very useful for us.Hadoop online trainings also provides hadoop online traning

    ReplyDelete
  41. it is a good piece of knowledge and it is used for hadoop learners.123trainings provides besthadoop online training to see free demo classHadoo online training demo class in Ameerpet

    ReplyDelete
  42. it is a good piece of knowledge and it is used for hadoop learners.Hadoop online trainings provides besthadoop online training

    ReplyDelete
  43. Itis good and it is very helpful for us.123trainings provides best online Hadoop training .to see demo Hadoop online training demo class in hyderabad

    ReplyDelete
  44. Thanks a lot for the wonderful information and it is useful for us.123trainings provides best Hadoop online training.tosee free demo classHadoop online training class in india

    ReplyDelete
  45. Is there an architecture diagram explaining the latest Hadoop cluster configuration at Hadoop ? Such as the size o data processed and the number of nodes etc.

    ReplyDelete