Storage Monkeys Blogs

Rants and Raves from the community
mk408

Having recently read more and more discussion about so-called dark storage, I've been reminded of something I routinely try to impress upon managers, especially clients: unless your use case is archiving, total bytes is a poor metric for storage.

In fact, the term "storage" itself may be partly to blame for the continued misconception. One need only glance at the prices of commodity disks to recognize that there isn't anything near a linear relationship between cost and bytes stored.

A quarter century ago was the golden age of the mini-computer, and the reign of the micro- was dawning. The Fujitsu Eagle was, at least in the semiconductor industry here in Silicon Valley, very popular, so it will be my yardstick. At a third of a gigabyte in usable space and just under 1.9MB/s, one could read or write the whole thing in just under 3 minutes. Today, a 1.5TB Barracuda is 4500 times the size but only 66 times the throughput, so it takes over 3 hours to go through the whole thing. A 6th-generation 450GB Cheetah is better, at under an hour.

I like the Eagle's 3 minutes as a rule of thumb. That's 21GB on larger, modern, 7200 RPM disks, and I suggest that everything beyond that may as well be considered superfluous or archive storage. Accepting this measure end-to-end means that one would only want 72GB accessible to a host off each 4Gb/s FC or 216GB per 4x SAS. Ouch.

A whitepaper from Xiotech criticizes storage vendors' performance numbers as being misleading, since they are based on short-stroking benchmarks, rather than representing the performance of the whole disk.

I suggest that short-stroking disks as a matter of course and leaving the rest purposefully "dark" is smart engineering. Suddenly, those 160GB drives look much more appealing than the 1.5TB ones, at least for performance-sensitive uses, such as databases.

Certainly, there are use cases where data beyond the 3 minute limit is still useful: anything that rarely, if ever, gets read. That tends to include backups, archives, audit trails, and even database intent logs. One may be able to have all these coexist on the same spindles as the "high performance" uses, but it would require careful forethought and testing.

My 21GB example with a 160GB disk means 87% "dark," to simulate an Eagle. It's a high percentage but nothing to be alarmed about, as long as it's done with full awareness.


Tagged in: Untagged 
Comments (13)Add Comment
jpolk
...
written by Jan Polking, June 24, 2009
"...total bytes is a poor metric for storage". THis is very true.

Another great blog post.
wazoox
you're right...
written by Emmanuel Florac, June 25, 2009
Right on spot. And still unaware people always come back with such stupidities as "that much? But I can buy a 1TB for 100 bucks at the supermarket!".
mk408
Thanks
written by Max Kalashnikov, June 25, 2009
Problem is, I still have that reaction when vendors are selling the same 1TB for 4 figures.
ChrisFricke
...
written by Chris Fricke, June 25, 2009
Sounds kind of like why wide striping was invented smilies/smiley.gif
michaelnz29
I think Dark storage is still waste
written by Mike, June 26, 2009
I agree with you that for performance; dark storage isn't necessarily a problem. Its still wasteful to have this unallocated storage in an array when it can be used for other data storage purposes, arrays and disks cost power and space in an already full data centre.
In reality what percentage of customers 'really' need performance at this level? of course there are some but I don't believe its the majority.
ChrisFricke
...
written by Chris Fricke, June 26, 2009
Very true. I can say for sure that my penny pinching government organization is absolutely capacity driven with performance being a much lower priority after protection, reliability, cost and of course capacity. Naturally if I put in a system that was stupid slow they'd think more about performance but it would be from a break/fix perspective and not a driving business need.
mk408
OK.. so what's the alternative?
written by Max Kalashnikov, June 26, 2009
The performance level I'm proposing is a flexible one. However, I'm suggesting that there's a number of minutes we can slide it to such that the answer to your question is, effectively, 100%.

Moreover, I think you're making the assumption that using unallocated space is free (already paid for). Although I agree that allocating it and merely having the data there is free, the usage is not. Try this empirically: measure the current draw on an array of idle disks, short-stroking disks, random-seek disks, and long-stroking disks.

Also, what's the performance loss from short-stroking to not?

Now compare that to the alternative of 1.5TB variable-speed drives, which replace over 10 of the "dark" spaces on 160GB drives and can fit in 1/4 U or less.

What are your results?
mk408
Stupid slow?
written by Max Kalashnikov, June 26, 2009
What constitutes unacceptably slow in an environment where capacity is the driving business need?

Surely the best engineering practice there would be to get the highest bit-density with just barely adequate performance.

I'm very curious as to somethign quantifiable here, since it provides at least one real-world data point against my 3-minute stab at it.
michaelnz29
...
written by Mike, June 27, 2009
Stupid slow is the point that application performance no longer meets customer (end user) needs. That to my mind is the delta between Price vs Performance that a company should be aiming for - not Stupid Performance (to continue on the Stupid theme lol) when it is not necessary.

Of course we must allow for scaling but how far do you go?
mk408
Quantify!
written by Max Kalashnikov, June 28, 2009
What are the performance numbers in your environment that would barely meet the customer's needs?

I also realize there is another useful point on the scale, perhaps much tougher to gauge, which is customer's wants. That is, a level of performance above which improvements are impercetible by the user. I take it that anything above that is the scaling allowance to which you refer.
ChrisFricke
...
written by Chris Fricke, June 28, 2009
The quantification is the hardest part. If the optimum performance metrics vary by application, and I have a multitude of applications sharing the same pile 'o disks, then where and what do I measure? I admit this is an area I struggle with in my environment. Heck with most of the apps I don't know what the optimum numbers are nor do I have the time to figure it out. The apps people don't know. The vendors don't know. The requirement is presented as "we need this many GB" not "we need this many IOPS". That's seems to be the general mindset of the industry. Certainly at the level I'm playing at.

Basically I have to meet capacity needs within a limited budget and use experience, basic best practices, and a bit of gut instinct to also provide adequate performance and data protection. Performance wise, I don't have an environment that is "barely adequate" but something I like to call "pretty darn good all things considered". Real scientific I know smilies/smiley.gif

In concept what you are saying is very valid. It's just not very practical for some of us. Not yet anyways.
mk408
...
written by Max Kalashnikov, June 30, 2009
I certainly think you've nailed the root of the difficulty: unrelated apps using the same set of disks, and the vendors encouraging this and other practices (including using inadequate metrics like IOPS and total capacity) designed to keep their margins very high indeed.

Since I'm a pragmatist at heart, I believe this is mostly a matter of education. It's my answer to the cliche "what has been your biggest challenge" interview question: it's continuously educating (even technically apt) managers that, if they want the best bang for their buck, they mustn't think of storage using only one- and two-dimensional metrics. The physical reality of disks has at least 3 dimensions (including time), and it shows.
lasswellt
...
written by Thomas Lasswell, July 14, 2009
Right on spot. And still unaware people always come back with such stupidities as "that much? But I can buy a 1TB for 100 bucks at the supermarket!".


you have no idea how many times I've heard that. smilies/sad.gif

Write comment
You must be logged in to post a comment. Please register if you do not have an account yet.

busy