Monday, November 9, 2009

HDFS High Availability

I have encountered plenty of questions about the single point of failure for the HDFS NameNode. The most common concern being that if the NameNode dies, then the whole cluster is unavailable. This means that HDFS is unsuitable for applications that need a high degree of uptime. This is not a problem when you run map-reduce jobs on HDFS, especially because a map-reduce system is a batch system and the uptime requirements of a batch system is typically not that stringent.

In recent times, I am starting to see lots of distributed applications that are using HDFS as a general purpose storage system. These applications range from multimedia servers that store mails, photos, videos, etc to systems that store updates from thousands of distributed sensors. These applications prefer HDFS because they can store a large volume of data. These applications do not use map-reduce to mine any of the information stored in HDFS. However, these applications do need consistency and availability of data. The Brewer's CAP Theorem states that most distributed systems need some tradeoffs among Consistency, Availability and Partition tolerance. HDFS's does an excellent job of providing Consistency of data at all times. Traditionally, it did not address Availability and Partition tolerance earlier. That could change to a certain extent with HDFS 0.21 release. HDFS 0.21 has a new entity called the BackupNode that receives real-time updates about transactions from the NameNode. This is the first step in making making HDFS highly available.

Here is a slide-deck that I wanted to present at ApacheCon 2009 about the current state of affairs regarding High Availabilty with HDFS.