Comments on HDFS: Hadoop and Solid State Drives

Very nice article Dhruba. Hadoop as is will not ge...

2014-01-06T20:58:40.325-08:00

Very nice article Dhruba. Hadoop as is will not get much speed up using SSD as you described, HBase may be slightly better. But I see Spark / Shark should get a significant speed up using SSD.
I wanted to get your thoughts on Spark / Shark

2013-09-27T05:14:36.242-07:00

This comment has been removed by a blog administrator.

2013-09-24T05:16:09.945-07:00

This comment has been removed by the author.

1. yes 2. yes 3. I do not think that the improveme...

2013-08-04T20:39:50.131-07:00

1. yes
2. yes
3. I do not think that the improvement in overall job latency will decrease by 5X. The copy phase is something that is a big bottleneck that could be partially addressed by a fast storage for map outputs.

Dhruba, This is one of the most informative, pre...

2013-08-04T19:27:19.874-07:00

Dhruba,

This is one of the most informative, precise, concise and well written posts on Hadoop/SSD topic. Kudos and thanks for sharing!

What are your thoughts on the following regarding harnessing SSDs smartly for Hadoop:

1. Would server-side SSD caching (read only I/Os) be a better and cost-effective performance boost alternative to using SSD as Hadoop/HBase storage device? This could potentially avoid SSD write pitfalls.

2. Regarding your observation "In a data warehouse, a large portion of queries access only recent data...Another use case would be to put map outputs on SSDs, this could potentially reduce map-output-fetch times" - well, again, would it make more sense to deploy SSD read-only cache for frequently accessed 'recent' Map output instead of using SSD as a storage device for that data?

3. Another aspect to look at is - Let's assume that SSD itself (whether used as storage or cache) brings in 5X I/O performance boost over HDD when Map output is harnessing SSD. In the overall Hadoop workflow and scheme of things - does it matter? Is I/O boost an incremental performance boost to overall Hadoop application or does it translates to say ~5X overall application performance boost?

Look forward to your response.

Thanks.

2013-06-29T04:26:06.532-07:00

This comment has been removed by the author.

Quick question on HBase benchmark - you said 45 % ...

2013-06-28T01:37:39.308-07:00

Quick question on HBase benchmark - you said 45 % cpu with 30K ops per second. Did you try with 2 region server processes to see if you could push this to 70K qps and cpu to 90 %. Running two processes would help you get some freedom from concurrency issues.

Another comment - is the workload batched multi gets for HBase ?

Hi Alex ,the FB version of HDFS (https://github.co...

2013-01-18T09:51:31.998-08:00

Hi Alex ,the FB version of HDFS (https://github.com/facebook/hadoop-20) has some fixes that improves random read performance of local short-circuit reads from a mere 92K ops/sec (when I measured it) to a 1million reads/sec. I do not know what those particular fixes are, but I am assuming that HDFS will make some significant improvements to latency going forward.

Any updates on this in the months since May? I am ...

2013-01-18T05:27:48.437-08:00

Any updates on this in the months since May? I am considering SSD for Hadoop but this basically tells me not to bother...

Very nice one..

2012-10-19T22:56:30.049-07:00

Very nice one..

I am relatively new to HDFS administration. I am ...

2012-09-13T04:43:12.970-07:00

I am relatively new to HDFS administration. I am really struggling to find good information on ideal HDFS block sizing to support the efficient operation of MapReduce jobs as well as HBase datafile/index storage. Any suggestions on good resources? Thanks
Jon

@Vladimir: I have not tried MapR. And we have not ...

2012-08-03T10:36:50.360-07:00

@Vladimir: I have not tried MapR. And we have not yet committed the above-mentioned-change to Apache hdfs.

" We found that the read-shortcircuit code pa...

2012-08-03T09:40:27.292-07:00

" We found that the read-shortcircuit code path spent considerable time in DNS lookup calls and updating metric-counters. We fixed these two pieces of code and observed that HDFS could sustain a peak random-read-throughput of around 92K ops/sec, the CPUs was now close to 95% usage. "

Have you committed your changes back to Apache?

Dhruba, Have you tried MapR? Definitely, won'...

2012-08-03T09:37:01.446-07:00

Dhruba,

Have you tried MapR? Definitely, won't help improve HBase numbers, but should improve raw DFS IO and reduce CPU%%.

I am really amazed by this site and really gonna s...

2012-06-06T02:23:05.744-07:00

I am really amazed by this site and really gonna share this site to my friends.

@Sujoy: you are absolutely right. In fact, we curr...

2012-05-14T15:22:45.293-07:00

@Sujoy: you are absolutely right. In fact, we currently run multiple servers instances per SSD just to be able to utilize all the IOPs. This is kindof-a-poor man's solution to the problem. Also, you have to have enough CPU power on the server to be able to drive multiple database instances on the same machine.

The last comment was by me (Sujoy)...not sure why ...

2012-05-14T13:02:29.535-07:00

The last comment was by me (Sujoy)...not sure why it showed up as unknown.

Thanks for the article Dhruba. If the SSD could be...

2012-05-14T13:01:15.444-07:00

Thanks for the article Dhruba.
If the SSD could be shared so that a handful of servers would share the 200K+ IOPS of a SSD (thereby ammortizing the cost), then do you thing that a jump from 400 IOPs (from HDDS) to 35-92K IOPS on a SSD would help a real workload on Hbase/HDFS?

Agree again, Matt. A HBase workload that currently...

2012-05-08T11:32:02.975-07:00

Agree again, Matt. A HBase workload that currently runs on spinning disks could run much better if/when the data is served from SSDs. But when it is time to benchmark/compare HBase with some other database on SSD, then there is lots to be done.

Even if it's poorly utilized, could the SSD st...

2012-05-07T23:30:29.618-07:00

Even if it's poorly utilized, could the SSD still be a big improvement? Maybe you only get 4k reads/second, but if your app was previously bottlenecked by the disks at 400 reads/second then it could make a huge difference to the overall application.

Agree with you Matt. A hybrid storage reduces the ...

2012-05-07T22:43:19.749-07:00

Agree with you Matt. A hybrid storage reduces the storage cost compared to a pure-SSD based storage. But the thing is that an application can only do a few thousand 16K random read from a HDFS datanode. So, you won't be able to utilize the max potential of the iops offered by the SSD.

Following on to Andrew's comment about hybrid ...

2012-05-07T22:35:58.173-07:00

Following on to Andrew's comment about hybrid storage, one potential setup would be to store the primary replica of a file on SSD and the 2 backup replicas on spinning disks. The write path in hbase is always sequential where spinning media does ok. If the regionserver can work with hdfs to localize and read from the copy of a file that is on an SSD, then you get most of the benefits of the SSD at a third of the cost.

You could then go further and get more granular about what's stored on the SSD's: certain tables, certain regions, non-datablocks, evicted block cache entries, etc. But, seems like storing just the primary replica on SSD could buy you a lot relative to the implementation complexity.

Thanks for your comments Andrew. It is very much p...

2012-05-07T15:54:48.717-07:00

Thanks for your comments Andrew. It is very much possible to make HDFS expose APIs so that data can go to spinning disks and other metadata objects to SSD. It will possibly improve HBase performance no doubt. And it is worth doing. The one question that I am still doubtful is that if we get to machines with more and more cores, than some of our existing database technologies might not be able to effectively utilize all the cores.

... of course I am not talking about a pure (cold)...

2012-05-07T15:08:19.832-07:00

... of course I am not talking about a pure (cold) read workload, but something more akin to FlashStore: http://www.vldb.org/pvldb/vldb2010/papers/I04.pdf

If we consider a HDFS that adds storage device typ...

2012-05-07T15:02:10.094-07:00

If we consider a HDFS that adds storage device type information to volume and block metadata, and an extension of the HDFS API to specify storage device affinity, then we might see an HBase that stores data to spinning media, but WALs and flush files and other short lived and frequently accessed objects to SSD. Do you think we may see enough benefit with this, plus the (theoretical) improvements you mention, to make a hybrid storage architecture with Hadoop+HBase+SSD+SATA make some sense?