<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.7.3" -->
<rss version="2.0">
	<channel>
		<title>Is deduplication a strategy or a finger in the dike?</title>
		<description>Comments for Is deduplication a strategy or a finger in the dike? at http://www.storagemonkeys.com , comment 1 to 31 out of 20 comments</description>
		<link>http://www.storagemonkeys.com</link>
		<lastBuildDate>Mon, 06 Sep 2010 00:08:17 +0100</lastBuildDate>
        <generator>FeedCreator 1.7.3</generator>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-189</link>
			<description>I just read the whole conversation and now my head hurts. Thanks a lot! - Chris Fricke</description>
			<pubDate>Thu, 09 Jul 2009 14:34:15 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-186</link>
			<description>I would have to agree with you Curtis.  I think part of the confusion lies in the conceptual gaps and overlap. 

It certainly makes my job easier to use the term compression when discussing traditional intra-file data reduction techniques, and deduplication everywhere else. 
 - joseph martins</description>
			<pubDate>Thu, 09 Jul 2009 09:57:35 +0100</pubDate>
		</item>
		<item>
			<title>What they said</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-183</link>
			<description>I concur with most of what the last two posters said.  I do believe it is both a strategy for all storage in the future, and it is also a finger in the dike to help us migrate to a more disk-based approach to backup.

I personally do not refer to dedupe as compression.  It is in the generic sense, in that it shrinks the data, but I think calling it compression confuses people.  If it worked like compression (the concept we've known for years), then a 10:1 dedupe ratio would mean that I could store a 10 TB database in 1 TB of disk, when what it really means is that I can store my 1 TB database 20 times in 1 TB of disk.  And I'm constantly having to explain this to people, which is why I do not use the term. - W. Curtis Preston</description>
			<pubDate>Thu, 09 Jul 2009 09:11:44 +0100</pubDate>
		</item>
		<item>
			<title>So many thoughts, so little time.</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-179</link>
			<description>First, in response to the title of Tim's post, and [as he requested] in the context of backups alone, the [b][i]concept[/i][/b] of deduplication is an extremely important consideration for any data protection strategy. We can all agree that we'd like to store as little as possible, preferably in the least amount of space, and still meet or beat our day-to-day operational requirements. Deduplication is all about keeping the physical amount of stored data to a minimum.  And, faced with a future filled with mind-boggling amounts of new data, deduplication is a good thing.

What is important to understand is that deduplication (which appears to be a term born in the storage industry in the past decade) goes by many names and is best visualized as a spectrum of solutions designed to take the redundancy out of data. At one end of the spectrum we find file formats such as JPG, GIF, MP3, MPG, GZIP, TAR and SIT. These are examples of intra-file deduplication (a.k.a. file compression).  Further along the spectrum we find single instance storage, a method of inter-file dedupe that has existed in many business applications since at least the early-to-mid 90s, possibly earlier. It's a simple implementation that identifies [u]whole[/u] [byte for byte] duplicate files and stores a single copy. A lightweight system of  pointers or stubs ensures that applications are unaware of the underlying data reduction. As we continue to move along the spectrum we encounter even more efficient methods of deduplication such as data chunking (at the block or sub-file level) and delta encoding. And storage vendors have, in recent years, added a new wrinkle to deduplication: timing. Should we deduplicate before or after moving our data over the network from point A to point B?

Commercial implementations of deduplication typically combine multiple methods, and all of them make trade-offs between complexity, efficiency and performance.  There is no single universally superior method or commercial implementation of deduplication. You guessed it - it all depends on what you're trying to accomplish.

And, it really doesn't matter whether we're talking about primary or secondary storage, old backup technology or new, near-line, off-line, local, remote, backup or archival storage. They can all benefit from deduplication whether it's embedded or bolted-on. 

Deduplication isn't a patch, it's an [i]integral [/i]part of efficient information and storage management. - joseph martins</description>
			<pubDate>Thu, 09 Jul 2009 00:24:15 +0100</pubDate>
		</item>
		<item>
			<title>Is deduplication a strategy or a finger in the dike? </title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-177</link>
			<description>Well, the first time I wrote was about the Tim's final question. This time I'll write about &quot;the finger in the dike&quot;.
.
Backup with dedup originally was meant as a solution to the problem that one would like to store backup data to disk, but needed to much disks for that purpose. That translated in huge costs of course. Somewhere a smart guy thought of this solution and voilà, deduplication was born. 
.
A simplification of course, nowadays there are a lot of dedup solutions. And there are a lot of different methods to choose from:  gateway vs appliance, inline vs post-processing, etc. 
.
Good reading material: 
http://www.snia.org/education/tutorials/2009/spring/data-management/DanielBudiansky_Understanding_Data_Deduplication.pdf
.
Philosophical speaking: year-to-year online (as in &quot;used by all production and test environments&quot;) data growth rates are about 100%. Rarely I encounter a lesser growth rate, sometimes I come across 150% growth rate or even more. 
.
Dedup vendor promise you dedup ratios anywhere from 1:20 to 1:50 or more. I believe that in real world usage most dedup ratios are somewhere in between 1:3 to 1:10. For example, TSM uses the incremental backup method for file backups thus storing only changed files and will show a lower dedup ratio. If you would use NetBackup with daily full backups dedup ratios will be a lot higher.
.
In the end I assume everyone agrees that when storing backup data to disk dedup will help you lessen the needed disk capacity. That is not the same however as lowering your costs...maybe something for another blog to discuss about.
.
As said above, disk usage growth rates are about 100% a year. Let's assume this rate is not growing anymore and stays at 100% for the years to come. This will mean that if today your dedup solution needs 5 TB net disk capacity, you will need 40 TB net disk capacity in three years! This is fairly simple math. At the end of year one you need double (100% growth) the capacity, at the end of year two you would need four times and at the end of year three you would need eight(!) times of todays' net disk capacity. Dedup'd, that is.
.
A 5 TB dedup solution is very common today. Lots of organizations need much more capacity, ouch!
.
So yes, with current per TB licensing of dedup solutions, you may definitely speak of a &quot;finger in the dike&quot; solution. I therefore strongly recommend to look at any suitable solution for which you do not need to pay per TB storage. I did not found any so for, so if anyone here knows of any vendor wich use different pricing schemes I'd gladly see your posting! - Alex Sons</description>
			<pubDate>Wed, 08 Jul 2009 20:36:45 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-176</link>
			<description>Good discussion, but I think think we have wandered from Tim's query:[i] Is deduplication a strategy or a finger in the dike?[/i]
I propose that the answer is neither.  Deduplication is a compression technology.  It can take big piles of bits and make them smaller and fewer.  Sometimes very successfully.  Made viable by the increased processing power now cheaply available, it can be used in a variety of places in the data  protection path to great advantage.  It may enable, along with other compression technologies, longer usage of existing resources.  One could claim that means &quot;finger in the dike&quot;, but so is any technology or process improvement that doesn't require replacing the entire data center.  

I belive that Tim is really asking, down in the meat of his blog, is &quot;Does traditional batch backup methods and applications still have a role  in a modern data center?&quot;  I say &quot;yes&quot;, however, there are many places where better methods are available.  as have  been discussed above.  Mirroring, and its close brethren BCV, CoW and CDP (logging), Snapshots (usually a variant of Delta Differencing) etc.  All nice more or less realtime data protection methods.  However, there still is a role for batch backups, even to Tape(gasp)!   Its especially useful when media changes are necessary,   especially for offsiteing.   long term archiving, relocations, disaster preparation, and such (sneakernet can still be faster and cheaper then ethernet - and you get to keep a copy).  But for minimizing downtime, yes, ther are much nicer things now. - T-SM Black</description>
			<pubDate>Wed, 08 Jul 2009 18:08:52 +0100</pubDate>
		</item>
		<item>
			<title>Use archiving software</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-175</link>
			<description>There are filesystem archiving products just like there are email archiving products. If you have files that should be kept long term, then you should use a filesystem archiving product to retrieve them.  Backup software is NOT designed to help you find a file that you havent' seen for three years, but archiving software is.

As to the part of your discussion where you're talking about space reclamation, that's an easy one.  A good archiving product can easily archive and then delete data for you.  No worries.

But I still agree with your last paragraph.  I just don't want them using it for archiving. ;) - W. Curtis Preston</description>
			<pubDate>Wed, 08 Jul 2009 12:19:43 +0100</pubDate>
		</item>
		<item>
			<title>Archiving versus backup?</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-174</link>
			<description>@wcpreston (and others?)
I really agree: one should use archiving moreoften than one does today. There is a problem however, i.e. when to decide data may be archived and when the data is still needed. Of course, email archiving is easy and commonplace, as received email will not be changed, only replied upon. Email is often referred to as semi-structured data.

For unstructured data however it is almost impossible to know when it is viable for archiving (and deleting from the filesystem). So that data will stay upon your systems forever. Although tiering storage might help in moving unused data to lower-cost media, it will still be online, and maybe someday edited.

Thus long term storage of backup data is still needed. If an organizations goes at length to implement ILM techniques to structure as much data as possible it'll be diminishing the need for old-fashioned backup software, and dedup'd backup storage on disk or snapshotting becomes more viable as a complete solution.

In the end I think most organisations cannot without competent old-fashioned backup software. In my practice all those new techniques are most suitable as an addon to, not as a replacement for that old-fashioned backup software.  - Alex Sons</description>
			<pubDate>Wed, 08 Jul 2009 11:58:37 +0100</pubDate>
		</item>
		<item>
			<title>Wrong?</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-173</link>
			<description>@Alexons

I don't see how what you said makes anything that we've been saying &quot;wrong.&quot;

I said that snapshots could be a replacement for backup if they were better than they are today.  You said they're a pain in the ass.  While I wouldn't go that far, we're essentially saying the same thing.  I would also say that I don't think that all snapshot products are a PITA, that some are actually quite nice.  (I'd put NetApp WAY out front here.)

You talked a lot about long term retention and discovery requests.  Those are both the job of an archive app, not a backup app, and I am sticking to my guns on that.  If one is keeping backup data for more than 18 months, I think one should re-examine what one is doing. - W. Curtis Preston</description>
			<pubDate>Wed, 08 Jul 2009 10:52:27 +0100</pubDate>
		</item>
		<item>
			<title>We'll agree to disagree</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-172</link>
			<description>I and many others are fine with the term &quot;near-continuous data protection.&quot;  You're not.  I tried to change your mind.  I give up. ;)

CDP is not just a journaling filesystem, because the idea behind a journaled filesystem is mainly about maintaining integrity, not the ability to go back in time, per se.  A journaling filesystem should help unravel whatever happened when your server had its power switch flipped on you.  This the same as the role of transaction logs in a database; they are used to roll back transactions that were in progress when something bad happens.  Neither of these technologies, though, would roll you back to a point in time before that.  A CDP system keeps track of every single write  in the order they happened.

Assume that your original is not damaged, and you told the CDP product to &quot;put me back to 5 minutes ago just before that idiot dropped a table.&quot;  It knows which blocks have changed in the last five minutes and can just put them back to the state that they were before that.  If the original was damaged, then it can be used as a standby copy while you're fixing the original, and it can restore the original if you had to completely reinitialize it.

This is similar to what NetApp can do with WAFL, but again, to a completely different level. Yes, snapmirror can be used to incrementally restore a volume to a previous point in time, but that point in time must be a snapshot.  With CDP, it can be any point in time.

As to your last question, Tim didn't use the term &quot;traditional backup software,&quot; you did.  What he said was that what we have today is &quot;not all that different from the backup software used 20 years ago.&quot; 20 years ago commercial backup software was in its infancy.  The product that would eventually become NetBackup was being used only by Control Data at the time. (The launched the AWBUS business unit in 1990 and launch BackupPlus in 1993.)  Legato was formed in 1988 and I don't think they were shipping yet.  Cheyenne's NetBack had just been introduced and would soon be replaced by Arcserve.  TSM wouldn't come out for 4 more years. Maynard Electronics had just released MaynStream.  They would be acquired by Archive Corporation, who would come out with Backup Exec in the early 90s.  I believe Alexandria had its birth in the early 90s, but can't find any reference about that. Hardly anyone was using commercial backup software in open systems 20 years ago.  (What I WILL say is that what they did for Mainframe backup has remained mostly unchanged for 20 years.  All the cool stuff we're talking about is only happening on open systems.)

20 years ago, the bulk of backup software in open systems was dump/tar/cpio to stand-alone tape.  That's what I did.  And to say that today's backup and recovery systems (when you include things like CDP, near-CDP, source dedupe software) is even remotely similar to what we did 20 years ago is simply not true.  And that was my point.

Maybe Tim's original point is that what MOST people are doing is fundamentally the same as what they did 20 years ago.  I will agree with that statement, but I also believe that this is because (for the most part) the industry hasn't offered them good enough options to switch. - W. Curtis Preston</description>
			<pubDate>Wed, 08 Jul 2009 10:44:30 +0100</pubDate>
		</item>
		<item>
			<title>How wrong you all can be?</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-171</link>
			<description>Wow. What a misperceptions!

&quot;Why are you using tradional backup software?  (technical reasons only)&quot;

OK. First things first. I'm an expert (I think) in IBM's TSM backup/archiving software, not other products, so I'll answer it from a TSM perspective only, although most arguments would count equally for other major enterprise products.

1. Snapshots are a pain in the ass, especially for long term retention!
As you should all know, each problem gets its' own solution. Snapshots really were not meant for dealing with backup problems but meant for speed of restore. As most restores are about yesterdays data, snapshots dealt with the inadquacy of backup software in delivering backup data at the speed levels required by todays organizations. CDP technology takes this one step further by restoring data just created and never been backed up...

Snapshots are meant to create them on disk and keep them on disk. If you could and(!) would move older snapshots to tape consider the following. Say there is an EDP-auditor onsite and he requests last years data from a specific application in order to compare it with what is stored today. Would \you want to deal with recovering from a years' worth of snapshots? And what if he also needs two year old data? You can replace this also with requests for old data çause of legal issues, not so uncommon anymore...

So in the end, you make backup/archive copies of your data also for long term purposes, with backup software! Snapshots and CDP really are technology mismatches.

2. Enterprise Backup Software shines in media management
Although TSM likely is the king here, media management is very important in dealing with loads of backup/archive data. The likelyhood of needing backup data normally detoriates over time so it becomes viable to move older backup data to tape. Some applications require lots of small files archived and be able to deliver files on request very quickly. In such organizations backup software like TSM can be used as a archive manager for files stored on WORM/UDO media, like Plasmon's libraries. 

3. Human errors or worse?
It's very easy to have human errors delete complete RAID-sets. Sometimes you don't even need human errors, just plain hardware faults can do the same for you. It's almost impossible to completely wipe out both the tapelibrary and offfsite volumes. Even if you have an angry administrator trying to wipe out all online and backed up data, it is not hard at all to prevent administrators from access to offsite volumes.

4. Data Dedup?
Data deduplication is the answer for another problem. If you do want to store backup data on disk you quickly end up with loads of disks and loads of disk-related costs. Most of the time however you backup (almost) the same data. Dedup is a smart technology which greatly reduces the disk capacity needed for backup storage. It does not lessen the need for backup software which manages the stored backup data, whether it is stored on old-fashoned tape or fancy dedup'd disk.

I hope this helps. - Alex Sons</description>
			<pubDate>Wed, 08 Jul 2009 10:21:53 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-170</link>
			<description>How about &quot;over-CoW&quot;? :)

Thinking about what term I'd use for what you're calling CDP, I realized we're talking from different viewpoints. You're talking about the end goal (what), and I'm talking about the method/technology (how). I don't have a personal &quot;what,&quot; so it's not CDP, meaning I don't consider any particular thing near-CDP. Similarly, I don't consider traditional backups to be near-snapshots. Even suggesting that the term should be read as &quot;Nearcontinuous Data Protection&quot; is a stretch for me, since, as you point out, there's no time quantum for continuity.

Continuous journaling (aka logging) sounds very much like a log-structured filesystem. Adding a log-structured component to a block-structured filesystem sounds quite a bit like NetApp's WAFL, not at all coincidentally, I'm sure. My over-CoW suggesting, though facetious, does, again, bring up the question of administrative control: is there any?

Call me old-fashioned, but, since Veritas was the first vendor I came across, 15 years ago, to offer a split-mirror feature, it's hard to abandon their terminology. I do agree that such a thing is a copy, but it's not [i]just[/i] a copy. It's guaranteed to be a consistent/atomic, block-level copy (or snapshot, if you will) of the device at a point in time. 

But, yes, back to the original question, I think there's a prerequisite question that must be asked. What consititutes &quot;traditional backup software?&quot; I've been going on the assumption that it is what decided when, (perhaps most importantly) what, and how (or where) to back up. This assumption is, obviously, wrong, if we're to include separate pieces of code, such as the underlying filesytem. Tim? - Max Kalashnikov</description>
			<pubDate>Wed, 08 Jul 2009 09:03:00 +0100</pubDate>
		</item>
		<item>
			<title>Back to the point</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-169</link>
			<description>Before I get back to the point, let me address comments in your post. 

First, I want to say PLEASE don't refer to true CDP as &quot;excessive snapshots.&quot;  The official definition of CDP specifically precludes the use of snapshots to deliver it.  it does not use snapshots in any way.  It is complete journaling of every single write -- continuously -- or it's not CDP.  Even if a snapshot is taken every second, that's still not CDP.  As I've told the near-CDP vendors, even a second is a period of time, and periodically is an antonym to continuous.

Second, I think I see where we're arguing (maybe).  When I say snapshots, I do NOT include full-volume copies (i.e. split mirrors, BCVs, etc.).  Those are not snaphots, they are copies.  Only Veritas refers to a full-volume copy as a &quot;snapshot,&quot; and it's confused the issue for anyone that's read their documentation.  So when I use the term snapshot, I am referring to a virtual copy of a volume, not a full one.  

So back to the point.  The article said that &quot;One thing that has not changed is backup. The same backup software you are using today is not all that different from the backup software used 20 years ago.&quot;  And that just isn't true -- not even close.  Near-CDP (or snapshots and replication if you prefer) represents a significantly better way to do backups than the full/incremental backup system that was state-of-the-art 20 years ago and common in most backup systems today.

But all near-CDP systems are not created equal.  And unfortunately, I would argue that the requirements I specified are quite normal if what we're talking about is a backup system.  90 day retention of daily backups is a very normal requirement for a backup system.  So -- back to the point -- the question was asked &quot;why are we still using backup software?&quot;  My answer is that the alternatives are still not quite ready for many people's requirements, whether it's application awareness or the retention of enough data to meet typical operational recovery requirements -- many alternative systems are still not quite there.  (I think some of them ARE there, but I'm trying to answer the question posed in the original post.

Let me defend the &quot;near-CDP&quot; term one more time, as it's a pretty common one.  Now that you realize I'm not talking about full-volume copies as snapshots it may help.  Both CDP and near-CDP incrementally transfer changed BLOCKS to a target system throughout the day.  True CDP transfers this data immediately and continuously.  Some near-CDP systems transfer changes immediately and continuously, others transfer changed blocks when you tell them to, but still do so incrementally, transferring only those blocks that have changed since the last time you told it to do so.  So they're very close on the backup side.    A true CDP system can recover to any point in time, and the near-CDP system can recovery NEAR to any point in time.  (Much nearer than the typical 24-hour batch backup system can.)  This is why people call it near-CDP. - W. Curtis Preston</description>
			<pubDate>Tue, 07 Jul 2009 21:50:02 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-168</link>
			<description>I'll start calling CDP &quot;excessive-snapshots&quot;, since I still don't consider CoW snapshots to be &quot;near&quot; anything, but a distinct tool.

I'm an admin (and a technically minded one at that), so I'm naturally biased against giving over control ot software which purports to be aware of something external to itself.

I still disagree that snapshots and traditional backups are only the same with regard to scheduling. In fact, neither has that as a characteristic. Rather, they have the characteristic that they [i]can[/i] be scheduled. They also share the characteristic of atomicity, which you have not addressed. A particular snapshot, just as a particular backup, can be kept, deleted, or replicated. Excessive-snapshotting takes away the scheduling and management options and therefore a substantial degree of control.  The two also share a commonality in the underlying full vs incremental option. A mirror snapshot would correspond to a full backup, with copying, whereas a CoW snapshot corresponds to an incremental backup.

The requirement you describe isn't stupid, merely large, which seems appropriate for a user who could be similarly described. The total number of snapshots is 251, though likely the biggest challenge is the 168 hourly snaps. I have not, in fact, done anything quite that extensive. The closest would be 4-hour snaps for a week with vxfs &quot;Storage Checkpoints.&quot; The trick was ensuring that there was enough spindle diversity, since, unlike what they call &quot;snapshots,&quot; these don't explicitly use a separate volume. I believe that, now, vxfs supports multiple volumes and complex allocation policies, so it may be easy.
 - Max Kalashnikov</description>
			<pubDate>Tue, 07 Jul 2009 12:52:47 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-167</link>
			<description>@DavidB

The problem is not the taking of the snapshots.  It's the keeping of the snapshots.  How many snapshots are you keeping on the primary storage?  For example, some people keep only one snapshot on the primary storage, and use their replicated target to hold history.  Others may take four a day, but keep one or two snapshots at a time and use one of them per day as a source for their backups.  

The problem is when you try to replace backups entirely by keeping a long history of snapshots on both your primary and replicated copy.  That's when you will see a performance degradation with many solutions.

Can you tell me a little more about what you're doing? - W. Curtis Preston</description>
			<pubDate>Tue, 07 Jul 2009 11:03:09 +0100</pubDate>
		</item>
		<item>
			<title>I think we're close</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-166</link>
			<description>@mk408

I think you misunderstand me! ;)

I don't care where the application awareness comes from.  I just think it needs to exist. I also think that many admins will need it to be in the app or they won't have any application awareness.  There are MANY people that are not comfortable with writing scripts to make something happen.

As to the whole goal/means discussion, I don't think we disagree there either.  The goal is to minimize business interruption (downtime) and lost transactions, (lost work which must be repeated or lost).  CDP does well at both. Near-CDP does well at the first, and much better than batch backup on the second.

As to snapshots and backup being similar, I couldn't disagree more.  One creates duplicated data (full backups and full file incrementals) and needs dedupe; the other eliminates duplicate data (only doing delta-level transfers once a day/hour/minute).  And here's the big one:  backup requires a restore; near-CDP (snapshots) do NOT.  You just mount the volume and you're off and running.  The only way they are the same is that they are done on a scheduled basis.  By that comparison, filling up my gas tank is the same as backup.

Yes, the implementations I looked at used separate volumes for snapshot data and where very proud of that.  Ask them to take a snapshot hourly, keep those for a week, and keep one daily for 90 days.  Watch them run, or watch them (as in the case of a certain large vendor and another large customer I witnessed) tell you that your requirement is stupid.  BTW, if you've done what I'm describing on COW storage, I'd love to talk more about what you've done offline. - W. Curtis Preston</description>
			<pubDate>Tue, 07 Jul 2009 10:59:41 +0100</pubDate>
		</item>
		<item>
			<title>FalconStor CDP</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-165</link>
			<description>We've been using FalconStor CDP (which I think Curtis would classify as &quot;near cdp&quot;) with Oracle snapshot agents and it has worked flawlessly with Oracle. We take four snapshots per day and barely notice any performance degradation. Eventualy we will be adding Exchange agents to protect our email system which we are now using CommVault to backup. - David Bowers</description>
			<pubDate>Tue, 07 Jul 2009 10:24:48 +0100</pubDate>
		</item>
		<item>
			<title>@wcpreston</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-164</link>
			<description>I think you misunderstood me, in that I believe we agree that CoW snapshots are, themselves, a form of deduplication.

We also agree that NetApp snapshots, though mimicking the functionality of CoW, are implemented in a more clever fashion.

We further agree that anything batch-based, be it traditional backup or snapshots, needs to be application-aware overall. However, I think we disagree where that awareness must lie. I'm suggesting that it need only be at the level of the human administrator, not built into the technology.

Where we clearly disagree is in point of view on PIT recovery. I don't consider arbitrary PIT (a.k.a. CDP) to be a goal, but, rather, a method/solution. I consider the goal to be before-point-in-time recovery. That is, the business case is to recover to a point before a known-bad stae.

To this end, both snapshots and traditional backups provide a similar, in my mind, route: they are both batch or single point in time based. They can also both be copied, deleted, archived, or otherwise administratively handled. This seems impossible with CDP.

Where we may not disagree, though where I focus my skepticism, is performance. However, a 90% decrease is consistent with my own observations of typical sequential versus contentious loads. Did the solutions you looked at use the same spindles for the main data as for the snapshot storage? If so, the performance drop is likely due to the particular implementation and not anything fundamental to CoW. - Max Kalashnikov</description>
			<pubDate>Tue, 07 Jul 2009 08:29:09 +0100</pubDate>
		</item>
		<item>
			<title>You got it</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-163</link>
			<description>Not only do I think they are viable alternatives, I think they are the best options today.  But the backup world moves very slow.

I think that it is ultimately how &quot;backups&quot; will be done in the future, but it is going to take at least 10 years to catch on.  Meanwhile, we keep applying band-aids like disk-backup and dedupe, cause that's what the people want.  Little bits of change at a time.

It's not the frequency of snapshots but the number of them you keep that's the problem.  With NetApp the answer is 256 as long as you want with no problem.  With everyone else I've tested, divide that number by 10 at least if you want no performance degradation. - W. Curtis Preston</description>
			<pubDate>Tue, 07 Jul 2009 08:20:01 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=is-deduplication-a-strategy-or-a-finger-in.html&amp;Itemid=136#comment-162</link>
			<description>I'd like to understand the scenario where CoW is not as good an option as backup software - Michael Mendez</description>
			<pubDate>Tue, 07 Jul 2009 08:03:29 +0100</pubDate>
		</item>
	</channel>
</rss>
