"Dark" storage: wastefulness or just good engineering?
Posted by: mk408
on June 24, 2009
Having recently read more and more discussion about so-called dark storage, I've been reminded of something I routinely try to impress upon managers, especially clients: unless your use case is archiving, total bytes is a poor metric for storage.
In fact, the term "storage" itself may be partly to blame for the continued misconception. One need only glance at the prices of commodity disks to recognize that there isn't anything near a linear relationship between cost and bytes stored.
A quarter century ago was the golden age of the mini-computer, and the reign of the micro- was dawning. The Fujitsu Eagle was, at least in the semiconductor industry here in Silicon Valley, very popular, so it will be my yardstick. At a third of a gigabyte in usable space and just under 1.9MB/s, one could read or write the whole thing in just under 3 minutes. Today, a 1.5TB Barracuda is 4500 times the size but only 66 times the throughput, so it takes over 3 hours to go through the whole thing. A 6th-generation 450GB Cheetah is better, at under an hour.
I like the Eagle's 3 minutes as a rule of thumb. That's 21GB on larger, modern, 7200 RPM disks, and I suggest that everything beyond that may as well be considered superfluous or archive storage. Accepting this measure end-to-end means that one would only want 72GB accessible to a host off each 4Gb/s FC or 216GB per 4x SAS. Ouch.
A whitepaper from Xiotech criticizes storage vendors' performance numbers as being misleading, since they are based on short-stroking benchmarks, rather than representing the performance of the whole disk.
I suggest that short-stroking disks as a matter of course and leaving the rest purposefully "dark" is smart engineering. Suddenly, those 160GB drives look much more appealing than the 1.5TB ones, at least for performance-sensitive uses, such as databases.
Certainly, there are use cases where data beyond the 3 minute limit is still useful: anything that rarely, if ever, gets read. That tends to include backups, archives, audit trails, and even database intent logs. One may be able to have all these coexist on the same spindles as the "high performance" uses, but it would require careful forethought and testing.
My 21GB example with a 160GB disk means 87% "dark," to simulate an Eagle. It's a high percentag but nothing to be alarmed about, as long as it's done with full awareness.

written by Emmanuel Florac, June 25, 2009
written by Mike, June 26, 2009
In reality what percentage of customers 'really' need performance at this level? of course there are some but I don't believe its the majority.
written by Chris Fricke, June 26, 2009
written by Mike, June 27, 2009
Of course we must allow for scaling but how far do you go?
written by Chris Fricke, June 28, 2009
Basically I have to meet capacity needs within a limited budget and use experience, basic best practices, and a bit of gut instinct to also provide adequate performance and data protection. Performance wise, I don't have an environment that is "barely adequate" but something I like to call "pretty darn good all things considered". Real scientific I know
In concept what you are saying is very valid. It's just not very practical for some of us. Not yet anyways.






Another great blog post.