Posted by: Tim
on July 2, 2009
Tagged in: Untagged
There is no doubt that the various deduplication technologies address a specific set of problems. For the purposes of this blog post, I'm going to focus on backup deduplication.
When most companies look at backup deduplication, they are typically trying to solve one or more of the follow problems:
- Backup window - data growth is causing the backup process to bleed past the backup window and into production
- Retention on disk - either retaining some portion of backups on disk or growing the number of days data is being retained on disk without ridiculously large amounts of disk storage
- Replication - reduce the amount of data that needs to be replicated to a remote site thereby lowering the cost of bandwidth
IT departments looking for one of these solutions have identified a problem with their backup processes and are looking for a solution to fix the problem. Simple enough... but are these problems REALLY the issue or are they symptoms of a larger problem with the tradional backup?
Posted by: mk408
on June 24, 2009
Tagged in: Untagged
Having recently read more and more discussion about so-called dark storage, I've been reminded of something I routinely try to impress upon managers, especially clients: unless your use case is archiving, total bytes is a poor metric for storage.
In fact, the term "storage" itself may be partly to blame for the continued misconception. One need only glance at the prices of commodity disks to recognize that there isn't anything near a linear relationship between cost and bytes stored.
Posted by: Tim
on June 22, 2009
Tagged in: Untagged
When we first started to maintain a list of "Top Blogs" it was an arbitrary list with essentially 5 people picking our favorite blogs. Well... the Storage Monkeys community is big enough now that it is time to let the group select the top 10 blogs... and we'll kick it off with Vendor Blogs first. So please... begin nominating by leaving a comment so we can build a list which to have people vote...
Posted by: mk408
on June 10, 2009
Tagged in: Untagged
As a UNIX veteran who has a vague recollection of /dev/drum, I keep thinking that it would be really nice to have a device to swap to that's somewhere between disk and memory in terms of speed and cost (total installed cost, not just each module).
Mostly, I feel constrained by the 32-48GB limits on moderately priced ($1-3k) servers. To go higher, for even modest processor speeds, is a $5-$10k premium. Moreover, DRAM doesn't really wear out, and it would be nice to put older, lower density modules to use.
The trouble is, what I've found so far is either very low capacity, priced much higher than the memory modules themselves, or both. I'm not particularly interested in adding 4GB of fast swap to a 48GB machine, though ACARD has something for $250 with a 48GB limit, with high density modules, defeating my second purpose. Similarly, I'm not interested in paying $10k for 16GB of RAM SSD ($625/GB?!) when I could just dump that money into the base server and get much faster access.
I'm not a hardware guy (in the EE sense), so I'm genuinely curious about this. Is it really that difficult/expensive to stick a memory controller (northbridge?) onto a SATA interface? Am I being too cynical in assuming that it's mere market "segmentation" without a low-end consumer segment? Does what I described already exist with the name "motherboard," just without the appropriate software/firmware to appear as a disk instead of a host on its SATA and/or SAS ports? LSI, are you listening?
Posted by: storagedude
on June 9, 2009
Tagged in: Untagged
We are in the process of evaluating a new storage platform. Can anyone offer comparo info on the HP EVA4400 vs NetAPP FAS2050?
Posted by: mk408
on June 9, 2009
Tagged in: Untagged
The choice of the unit of measure of storage is interesting to me because it's otherwise tought to measure price for performance.
I remain agape at the price tag on high-end, supposedly high-performance, storage systems. Connected by FibreChannel or gigabit Ethernet, that's a limit of 400 and 110 MB/s, respectively. (Yes, I know of 8Gb/s FC and 10GE, but these are prohibitively expensive, if supported.Even link-aggregated GigE practically tops out at 880MB/s) I'm thinking that writes across 40 7200RPM disks could saturate an FC link, and it would take fewer than 20 15k disks. Neither of these strikes me as impractical or unusual sizes of storage arrays, even doubling those numbers for RAID 1. More importantly, such arrays don't strike me as high performance.
Particularly shocking is that a brand name "SAN" solution of such a size would cost in the neighborhood of a quarter million dollars and be at its performance limit. Granted, it might be half that price without fancy management and replication software. whereas the less fancy alternative, at one tenth to one fifth the cost, would still be expandable from a performance standpoint. How much does the Veritas database suite cost these days?
Posted by: mk408
on June 6, 2009
Tagged in: Untagged
Recently, I had a discussion with a colleague about storage performance, and he kept talking about IOPS, whereas I have always measured it with the, perhaps more traditional, bytes per second. Since IOPS is effectively the reciprocal of latency, I have tended to ignore it for disk storage, as I have yet to see any use case which is synchronous, let alone sensitive to sub-centiseond latencies.
The alleged use case is a random write-heavy Oracle instance. I confirmed with a DBA I know that Oracle's block sizes will range from 4 to 32kiB. That suggests that the worst case random I/O can't occur, as the payload for each operaton will be between 8 and 64 sectors. Still, I no longer have the data for benchmarks I ran, to be able to quantify how much of a difference this might make.
I can, however, quantify what I've observed in terms of throughput numbers. A commodity 500GB 7200RPM SATA drive can do around 100MiB/s for sustained, sequential I/O. It drops to around 10MiB/s for sustained, contentious (though not rigorously, statistically random) I/O. If it can do around 100 IOPS, the payloads must be much larger than even 64 sectors, closer to quadruple that number. Perhaps Linux scheduler queue combined with NCQ gets enough adjacency for that fourfould increase.
Posted by: Tim
on June 3, 2009
Tagged in: Untagged
Now that NetApp has counter-bid EMC's offer for Data Domain, you have to wonder what the impact of losing the deal means to the storage giant.
There is some speculation that EMC was merely ginning up the cost of Data Domain for rival NetApp but I'm not so sure. EMC chief Joe Tucci was pretty clear that the acquisition of Data Domain was an effort to add the technology to their portfolio of deduplication solutions. I'm not sure if a vendor needs four or five flavors of deduplication technology to be competitive but EMC certainly believes it.
So what happens if EMC doesn't get Data Domain?
Posted by: DavidB
on June 1, 2009
Tagged in: Untagged
The just announced EMC bid for $1.8 billion dollars is amazing in that it appears that it's just another PIECE of the deduplication portfolio for EMC.
That's an expensive piece.
It's an especially expensive piece when you consider that Quantum has a market cap of just $246 million... and already owns the IP of EMC's current deduplication solution.
Posted by: jpolk
on May 21, 2009
Tagged in: Untagged
NetApp's acquisition of Data Domain is interesting on a number of levels.
NetApp has always viewed Data Domain as a competitor in dedupe but rather than fight the battle, their strategy was to give it away for free as a feature. Now with the $1.5 billion acquisition, all of a sudden dedupe is a revenue generator. Are NetApp customers who are deduping primary storage for free now going to pay a premium to dedupe backup data?
There is probably going to be a good dose of engineering to integrate the two dedupe systems. If you are a current NetApp customer deduping your primary storage today, what does the backup flow (with dedupe/rehydration) look like when you add a completely separate deduplication system for backup? This is further complicated by NetApp's WAFL file system.