BasRaayman's technical diatribe

My personal thoughts on technology and its obfuscation
BasRaayman I received a link to an article where we can find an interview with Symantec's Mathew Lodge and their view on data deduplication. I couldn't help but noticing the following quote:
According to a recent survey by Applied Research, more than half of all organizations expect to spend more on storage in 2009 than they did in 2008. But at the same time, the latest Symantec State of the Data Center Report indicates that storage utilisation hovers at just 50%.
Now, that got me thinking on a couple of things. First off, I tried to look up this survey. Unfortunately, the results from the Applied Research-West seem to be beyond my Google skills. On the other hand they seem to be the standard company used by Symantec for surveys that somehow seem to have results that are aligned with Symantec's product portfolio. Talk about a coincidence!
Anyway as they said, "more than half of all organizations expect to spend more on storage in 2009 than they did in 2008" I was pondering how this could be? We are seeing technologies like the deduplication mentioned in the article. Almost all vendors are able to offer something similar. Same can be said about thin or virtual provisioning. Heck, thanks to the effort in the blogosphere and feedback from partners and customers, EMC even decided to change it's policy and make virtual provisioning free for the V-Max, DMX4 and DMX3.
Seems a bit odd that almost all storage vendors are delivering methods to reduce the disk space footprint in their SAN and NAS, but we still see an increase in expenditure. Sure enough the licensing costs for such new features are to be included. And perhaps you even need to buy new hardware to fully utilize such new features. But all of the big vendors are quick enough to tell us the return on invest when we purchase new stuff. So that can't be it, right?
And you know what? They are right!
Simple enough, we don't know how much disk space our users need! Hell, most of the time, the user himself doesn't even know! And then there's the fact that it's too easy to get new storage.
We provision like there's no tomorrow. Not just disk space, but also computational power. You need to test something? Here, have a VM and go right ahead. What? You're on Solaris? No problem, here's a brand new sparkling zone, just for you. How much disk space do you need? Two Tera? No wonder we called it Terabyte, those are monstrous amounts of disk space.
I know the dilemma, and when you ask your users if they really need all of that, you usually get a blank look on their faces, outrage - How dare you ask me that, isn't it obvious? -, or perhaps even an educated guess.Some will even give you forecasts... If you are lucky. Things will get better with technology like TP and dedupe. And things will get worse when we go for new technologies like cloud., but fact of the matter is, we have made provisioning too easy, and we've somehow lost the art of asking if they really really really need it. Usually the answer to anyone provisioning is a simple "no".

Tagged in: Untagged 
Comments (13)Add Comment
stephen foskett
...
written by stephen foskett, September 09, 2009
Good point here! The kit vendors don't actually care about utilization since they sell raw capacity, not used space. Go ahead and knock yourself out trying to improve utilization - they know (or believe) you'll fail!
thesantechnologist
Doritos and Storage vendors are the same...
written by thesantechnologist, September 09, 2009
Doritos: "Crunch All You Want, We'll Make More"

Storage Vendors: "Dedup, don't dedup, mirror, replicate, ALL YOU WANT, we will make more!"

At the end of the day people don't delete data as quickly as they create it, this will continue to be a problem.
BasRaayman
Re: Doritos and storage
written by BasRaayman, September 09, 2009
Stephen, thanks. smilies/smiley.gif

Regarding the Doritos. Well, those taste good, and once you've eaten some it's hard to keep your fingers off of them. smilies/wink.gif But aside from a couple of companies/audiences, how many people actually create new content? Usually we replicate and distribute more then we actually create (from scratch), or is that just my observation? When I check with colleagues we create clones and testing environments, and some more clones and perhaps have development systems that merge actual new data in to a central repository, that we again replicate and back up.

Point is that we don't create that much new content, we spend way more storage and computational power on redundancy than on the creation of actual new data. Especially in bigger environments and the enterprise.
sunshinemug
...
written by sunshinemug, September 09, 2009
Interesting post, Bas! As far as your comment that "we don't create much new content" I'd say it could depend on the industry. For example, what I'm hearing thru. Ocarina is that certain industries are creating immense amounts of content, quite often involving huge files that aren't easy to reduce or manage. Examples include: post-production studios, genomics labs, oil and gas ops, hospitals, and a few others I can't think of off the top of my head. And it ain't always easy to tell them to just delete, either.
BasRaayman
...
written by BasRaayman, September 10, 2009
Hey Sunshine,

well, it's true that you will find examples that create huge amounts of data. Another example would be stock exchange. On the other hand, all of this data still ends up being replicated way more then it's actually being used. Plus you have the ILM. If you take the data from the NYSE, it's only really valuable when it's brand new. As soon as the information ages it loses it's value very quickly, up to a point where it's only useful or valuable to plot out historic graphs and perhaps spot trends.

It's all about implementing a proper ILM, or not? Even in the examples you mentioned, you would need huge files, and while you are actually processing the information contained in the data, you absolutely need redundancy. Heck, you might even need a lot of redundancy. But what happens when the data has been processed? How many companies actually reduce the redundancy because the information and data is no longer directly relevant? We don't want to really archive it to a cheap bulk medium (and perhaps just replicate that medium to an on/off-site copy) because we might need it again, and then we want it there. Immediately. New technologies help us in reducing the amount of data stored on disks, but won't bring us anything if we as a customer are not aware of what our own clients need (not want smilies/wink.gif) and following up on if the needs perhaps changed over time.
sunshinemug
...
written by sunshinemug, September 10, 2009
Thanks for the response--and I agree that there are many examples like the ones you give above. Having done several in-depth customer interviews for Ocarina, there are other examples that are a bit trickier. Situations in which new data cannot be deleted -- regulatory, research, HiPAA (for medical) and so on. Also, post-production studios have an interesting problem. They need to keep data online during the 18 months or so that it takes to build out an animated feature--and that can run in the 100s of TBs. After that time, the stuff can go to tape, but not before. So, interesting situation there with very large files that aren't necessarily snapped or mirrored or anything. But as I said, I agree that there is a management element here, and the more I learn about storage, the more I realize that our own human habits need to be looked at (the packrat phenom that some bloggers we know have discussed).
RBruklis
...
written by RBruklis, September 11, 2009
Good Stuff... The science of giving storage space is vastly easier... the art of asking why or how much is still clouded in mystery (pun intended).

Bottom Line: Humans are pack rats...we all need something in the moment and then hold on to it just it case... just like my 68 Mustang in the garage that hasn't moved in 2 years and just like the 5MB daily SQL reports I create, distribute, and hold onto forever.
storagenerve
...
written by storagenerve, September 14, 2009
You are on the point with this post. With performing storage assessments for customers, we have typically seen a very similar case. THere is a big demand for storage from DB Admin's, Sys Admin's and really it has become hard to not allocate storage based on their current needs, even those needs may not be valid 6 months or a year down the road. Analyzing customer storage environments yield between 20 to 30% reclaimable storage and on top of it a ton of un-utilized storage, meant for FUTURE GROWTH. Well FUTURE GROWTH for what??? A company that might possibly be in Bankrupcy or going through shutting down datacenters or possibly reducing workforce, still ends up with a graph that shows an increase in the overall storage consumption.

There was a series of blog posts, i had posted related to Storage Resource Analysis, giving an overview of where this wastage was coming from and what could potentially be the reasons for it. http://storagenerve.com/2009/0...imization/



ChrisFricke
...
written by ChrisFricke, September 14, 2009
This year our storage budget is less than previous years. Meaning we will spend less in 2009 than 2008. We are still maintaining 100% growth (give or take) and adding replication for DR type purposes, plus lots of continued server virtualization but this year we're implementing the dedupe and storage virtualization we bought last year. Naturally, this allows us to shake the storage tiers blanket in a major way which keeps us from having to purchase any T1 storage. Our storage money this year (what little there is) will likely be spend expanding the dedupe tier. The rest is absorbed by forcing higher utilization, aggressive tiering, and realistic expectations from our customers. Eventually it will catch up and things will need to shift again but for the time being our current strategy helps our total budget to stay flat (with the storage portion being much less).

What we haven't addressed, and have no idea how, is the data lifecycle management. People are packrats and even it we have it under control (capacity wise) we are still adding more and more data. It's easy for me to say that data is old and it has to go. It's not so easy for the people in various business units to let go of it. Who am I, as the admin, to say that file x that's ten years old has no value? If my customer says it has value then it has value... and I'll gladly buy appropriate storage for it as long as money is available.
sunshinemug
It's not just about storage anymore
written by sunshinemug, September 14, 2009
Wow, this discussion is really getting interesting! Great post. I showed it to Mike Davis at Ocarina and he wrote a post in response: http://onlinestorageoptimizati...nymore-2/.
BasRaayman
Responses
written by BasRaayman, September 17, 2009
Wow, Sunshine is right, this seems to have kicked a slight discussion. Let me take the oppurtunity to give a reaction (sorry for the delay). smilies/smiley.gif

What everybody seems to agree on is that it's a good idea to ask if your customer really needs that much space or computing power. It's easy to say that you will just give him what he wants as long as he is paying for it, but depending on the fact if you are dealing with internal or external customers it is, in my opinion, a bit short sighted to just give the custer what he thinks he wants. Finding out what your customer actually needs is an art! There is a fairly well known comic out there that shows this in a very simple manner:



And I am stating here that the same thing goes for a customer ordering storage or computational power. Why not ask one more time to make sure that I am aware what the customer wants? Why not describe to the customer in your own words if what you have planned for him, actually maps out to his request? And if you have a feeling that he might need less, give him less. He'll thank you for it and probably be quicker to come by if he needs more, which accidentally is no problem with things like oversubscription and dedupe. We'll just give him more. smilies/wink.gif

ChrisFricke makes a nice point here which I want to comment on. He states that:
Who am I, as the admin, to say that file x that's ten years old has no value? If my customer says it has value then it has value...


Couldn't agree more! But how many of us actually come back to our customers and after five or ten years actually ask if the value of his older data perhaps changed? Do we actually make a suggestion to archive it out? Do we check if data is still being accessed, and how frequently if at all? Do we say "Hey, you're not using this as much as you did, perhaps you want to reduce the SLA or change the storage tier of your data?"? That's something I haven't seen or heard of that much, or am I wrong and am I asking the wrong people around me? Do any of you actually have such a strategy?
BasRaayman
...
written by BasRaayman, September 17, 2009
Hmm, the comic didn't show up, so here's a link and another try to show the comic:



http://bp1.blogger.com/_TsCq7i...roject.jpg
ChrisFricke
...
written by ChrisFricke, September 17, 2009
You're right... I don't spend a lot of time going back to people asking: "Do you really still need these files?" Not to say that I've never done that (cause there are occasionally very glaring examples) but it's not part of my routine. What we've done instead is implemented file virtualization and automated storage tiering to move that old and infrequently accessed data around. Personally I think from a data perspective this is partially a workaround and and doesn't really solve the core problem. Yeah sure maybe it's now more economical to store that old data but the fact is you're still spending money to store data that isn't really needed anymore. Mixed in among all that is the legitimate data that needs to stay around "forever" for who knows what reason (policy, legality, etc). So far - the cost of disk technology has won over the hassle of data analysis.

Write comment
You must be logged in to post a comment. Please register if you do not have an account yet.

busy