<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.7.3" -->
<rss version="2.0">
	<channel>
		<title>IOPS? Really?</title>
		<description>Comments for IOPS? Really? at http://www.storagemonkeys.com , comment 1 to 13 out of 13 comments</description>
		<link>http://www.storagemonkeys.com</link>
		<lastBuildDate>Thu, 09 Sep 2010 09:20:48 +0100</lastBuildDate>
        <generator>FeedCreator 1.7.3</generator>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-646</link>
			<description>&quot;First let me answer your question: the 8K transfers are being done by the DB to your disk.&quot;

Perhaps I didn't ask the question specifically enough. Remember that my context is that of engineering adequate capacity. In this case &quot;my disk&quot; is ambiguous, as it is comprised of many parts.

The size of the I/O system call the database would be performing may well be 8K, but to characterize it as a &quot;transfer&quot; is misleading. There are, at the very [i]least[/i], transfesr from memory by the HBA out to its view of &quot;the disk,&quot; then transfers by the disks's embedded controller into its own memory, then, finally, transfers to the storage medium itself (what I call &quot;the spindle&quot;). These are all asynchronous from each other.

&quot;So, the same hardware that does 2500 IOPS can do a variety of different &quot;MB/s&quot; throughput, depending on the record size you use.&quot;

This is begging the question. The converse could just as easily be said, that hardware which does 100 MB/s (e.g. a single COTS disk) can do a variety of IOPS, depending on the record size used.

&quot;(considering you have enough bandwidth of course)&quot;

This parenthetical remark is the motivation behind my bringing all this up in the first place. Bandwidth is hardly so cheap and easy as to be effectively infinite. My allegation is that enough bandwidth is not at all a matter of course and that it is a more useful measure of [i]maximum[/i] performance than IOPS.

&quot;It's a simple equation.&quot;

It's only simple if all the variable are independent or constant. The problem is, they're not.

Saying that a system can do some number of IOPS, regardless of record size, becomes untenable with large enough record sizes. For some components (such as an SSD or a RAID5), this would even be true with [i]small[/i] enough record sizes.

I guess what I'm looking for in the way of a compelling example is where IOPS is a fundamental engineering constraint. That is, a commonly-used component which can't possibly do more than some (otherwise achievable) number of IOPS, without going to a different technology, as opposed to merely adding more instances of it.

For throughput, I can offer system &quot;bus,&quot; such as PCIe. One can divide the channels up however one wants, but there's a finite number of them, around 7GB/s worth on modern chipsets.


 - mk408</description>
			<pubDate>Tue, 10 Aug 2010 05:57:28 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-644</link>
			<description>You seem to be ignoring the information that's been giving to you.. I'll try to answer an specific question you've made, so you might see it through your own perspective:

David Noonon said in the previous post:

&quot;Let’s say you have a sizable database and you find your response time to be averaging 30-40ms during the middle of the day after user complaints. You take a look at the MB/s throughput and see that the data LUN is only measuring 20MB/s. You know that an individual spindle can exceed this performance by far. You then take a look at IOPS and see that you are pushing 2500 IOPS because your transfer sizes are only averaging 8KB.&quot;

And you asked back:

&quot;Calculating it backwards from 20MB/s and 8K transfers would mean that 20MB/s is still the more relevant measure. For that matter, where are the 8K transfers, at the spindle itself?&quot;

First let me answer your question: the 8K transfers are being done by the DB to your disk.

Now, let's make it an equation for easier understanding:

20MB/s / 8KB = 2500 IOPS , or to make it even more clear:

20,000KB/s / 8 KB = 2500 IOPS

If I understood correctly, your point is that the &quot;20MB/s&quot; should be more relevant than the &quot;2500 IOPS&quot;, just because they are related. However, the way the above is equation-ed, shows very clearly that IOPS is actually a rate in the equation and it also depends on the record size. So that single same IOPS value (2500) will let you choose among several different &quot;MB/s&quot; and &quot;record size&quot; parameters, as long when they are multiplied the result is still 2500 IOPS. For example:

40MB/s / * 16KB record = 2500 IOPS
80MB/s / * 32KB record = 2500 IOPS
160MB/s * 64KB record = 2500 IOPS
320MB/s * 128K record = 2500 IOPS

So, the same hardware that does 2500 IOPS can do a variety of different &quot;MB/s&quot; throughput, depending on the record size you use. And the record size is something mostly dependent on the kind of application you're using. You might have a DB doing operations around 8KB size each (which is true in most cases) or a streaming application doing 128KB per operation (also true in most cases). So on a system with 2500 IOPS, those two applications would achieve completely different throughput: 20MB/s for the DB and 320MB/s for the streaming.

Now, if you know how many IOPS your application requires and how many IOPS your hardware provides, it doesn't matter if your application does transactions of 8KB or 128KB (considering you have enough bandwidth of course), you know you can handle the amount of transactions, end of discussion.

Looking form that perspective, the throughput in MB/s is completely irrelevant. Perhaps you're just too focused on a DB application that will always have records sizes around 8KB or so, in which case the throughput in MB/s is definitely simply reciprocal to the amount of IOPS -- because in that very specific situation, you're considering a *fixed* record size, a constant -- making the other two variables directly related.

The only way to make sure you will have enough I/O for all of them is to measure how many operations they all do, regardless of the size, and see if your hardware provides with enough IOPS.

Imagine this example..... You have a Hypervisor hosting 3 Virtual Machines.. Now, each of those VMs would be different: DB server, WEB server, Live Streaming server.. Now, the DB server needs only 20MB/s, the WEB server, only 10MB/s and the Live Streaming VM needs 128MB/s... But each of those VMs would be doing transactions at completely different sizes.. The DB would do transactions using records of 8KB (total 2500 IOPS) the Web would use, lets say, mostly 16KB transactions (625 IOPS) and Live Streaming server would be doing transactions of 128KB in size (around 1000 IOPS).. Together, they all do 128+20+10 = 158MB/s and they do 2500+1000+625 = 4125 IOPS. There are plenty systems out there that can deliver 158MB/s without being capable of doing half of the required 4125 IOPS.

IOPS = Throughput / Recod Size

Throughput = IOPS * Record Size

It's a simple equation. You can't simple ignore one of the parameters and rely merely on another. - bragatto</description>
			<pubDate>Thu, 29 Jul 2010 17:57:14 +0100</pubDate>
		</item>
		<item>
			<title>@handpix</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-515</link>
			<description>Sadly, nothing you've mentioned dissuades me from my original tone, if it even addresses my points, which I'm not sure it does.

To take you up on an analogy, I wouldn't look at fan speed [i]at all[/i] as a measure of health of a computer. Similarly, I routinely remove &quot;load average&quot; from all monitoring tools, especially those which might be used by someone without enough technical depth to ignore the metric in favor of more meaningful ones.

I'm not sure how iSCSI entered into the discussion, nor what a &quot;lop sided&quot; DBA is. The DBA I consulted for the Oracle information is one I trust specifically beacuse if he were uninformed, he would have said so, rather than passing apocryphal information. That's also just for Oracle. For the FOSS databases, I'm the DBA.

Your reference to a monitoring dashboard causes me to realize we may be talking different senses of the word &quot;performance.&quot; What's relevant to me (and therefore colors my titular question) is capacity engineering and, therefore, measurement/benchmarking. A performance degradation &quot;lighting up&quot; something for a human to handle is something I consider to be way too late, anyway.
 - mk408</description>
			<pubDate>Thu, 31 Dec 2009 18:12:39 +0100</pubDate>
		</item>
		<item>
			<title>Diving in Late, but wanted to weigh in.</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-511</link>
			<description>I it is important to consider IOPS ONE performance metric when looking to performance. I have worked with some vendors, DBA's and companies that treat IOPS as the end all to end all metric. This is no differant that looking at your CPU FAN speed as the sole indication of your computers health. The converse is also true. It is important to not look at raw throughput as the sole indication of performance or health of an SAN or storage solution. 

Rather than entering a technical pissing contest, maybe I can throw some light onto what IOPS indicate in more detail (some recap may occur).

iSCSI uses a TCP session between the initiator and storage controller. While mpio or multiple HBA's give multiple channels, just with direct connected SCSI, there is a single command path to the storage. While demand for access can be multiple hosts (in the case of a VM server) or multiple sectors (in a DB) at the end of the IO pipeline, is a single data path. This is similar to a router. You may have a router passing packets for hundreds of hosts, but it can only put one Ethernet frame on the wire at a time. 

That being said, I view IOPS very similar to Packets Per Second in the network world. The two are very nearly inbred cousins of each other. In networking, performance numbers are typically given in Packets Per Second (PPS) even though packets can range from 32byts to 1500 bytes (or more with jumbo frames or non/Ethernet technologies, etc). This is exactly the same with storage and your arguments here. An IO could be one sector, or 100 sectors. The reason they choose PPS for performance is because the majority of the work is tied up in making routing or switching decisions that occur regardless of the packets size. This is the same with an iSCSI operation. If there is a single sector, or 10 sector read, there is an inherent amount of processing by the IO controllers and network stack, etc that must occur. 

So long story short after that round of analogy and corollary, IOPS are like PPS. They are only one indication of performance. But only looking at a routers PPS count would leave you blind to other performance metrics. I only disagree with your tone that IOPS can be ignored, or are irreverent compared to other stats. This is probably because of an encounter with an uninformed or lop sided DBA in your recent past. Take all of the information you have at your disposal such as (Throughput, latency spindle performance, cache queue depth, adjacency, and yes IOPS) into account when factoring performance. If you ignore any one point, it is probably the one that will side swipe you.

Now, after all that is said, experience over the years shows me that IOP counts are usually the metric that naturally ends up giving you an alert LONG before anything else on the monitoring dashboards lights up. 

That is my $0.02 worth. Cheers! - handpix</description>
			<pubDate>Wed, 23 Dec 2009 20:57:49 +0100</pubDate>
		</item>
		<item>
			<title>@DaveN</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-246</link>
			<description>My original question is whether IOPS is more relevant than MB/s.

What you describe is a configuration that storage vendors love to sell people on, but, as an end user, it's irrelevant to me. &quot;Storage processor&quot;? That's my RDBMS! ;) It knows nothing of LUNs, only block devices, so that's what I want to measure.

I'm certainly not on any &quot;traditional&quot; array, which provides some number of IOPS, the very measure I'm questioning in the first place. Where is this 150-160 IOPS per spindle coming from? Calculating it backwards from 20MB/s and 8K transfers would mean that 20MB/s is still the more relevant measure. For that matter, where are the 8K transfers, at the spindle itself?

Response time as a measure of contention seems flawed to me, since one can have increasing response times with even a single sequential transfer, so long as it overstuffs some bottleneck. - Max Kalashnikov</description>
			<pubDate>Mon, 20 Jul 2009 21:10:00 +0100</pubDate>
		</item>
		<item>
			<title>@mk408</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-245</link>
			<description>Your original question was whether IOPS was a relevant metric.  What I am saying is that typically it is the quantity of operations, not the size of the operations that is most relevant.   This is not true when it comes to interconnects, however interconnects (at least with FC) are rarely the bottleneck.  
I have found that typically bottlenecks follow this order:
1.Spindle count
2.Cache
3.Storage Processers
4.Interconnects 
When it comes to actually assessing whether there is contention (not enough I/O to go around), it is response time that really provides the most value.  Let’s say you have a sizable database and you find your response time to be averaging 30-40ms during the middle of the day after user complaints.  You take a look at the MB/s throughput and see that the data LUN is only measuring 20MB/s.  You know that an individual spindle can exceed this performance by far.  You then take a look at IOPS and see that you are pushing 2500 IOPS because your transfer sizes are only averaging 8KB.  Let’s also say that the read/write ratio is 80/20 and your RAID set is configured as a RAID-10.  If you are on a traditional mid-range array that produces (for instance) an average of 150-160 IOPS per spindle at ~5-8ms, and you only have 14 spindles allocated instead of 20, then that is likely an issue.  If you did have 20 allocated but were no longer achieving the IOPS/spindle you once were (prior to loading up the storage array with more enclosures perhaps), then you would start to take a look at cache and storage processor utilization.  
 It is possible to go too far the other way and get so hung up on spindle count that you forget about everything else.  I’ve seen several environments where they are using well over 50% of the storage processor capacity during peak times.  People often forget about the goal of being able to sustain a storage processor failure without significant performance degradation.
 - David Noonon</description>
			<pubDate>Mon, 20 Jul 2009 19:44:37 +0100</pubDate>
		</item>
		<item>
			<title>@DaveN</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-241</link>
			<description>The first question that popped into my mind was &quot;how are disk spindles not infranstructure?!&quot;

I believe I've already outlined what I believe it would take to exceed interconnect throughput with disk throughput. Are you refuting that?

What does it mean to increase I/O contention? In what unit is that measured?

Are you suggesting that latency, in this context, is something other than simply the reciprocal of IOPS? - Max Kalashnikov</description>
			<pubDate>Mon, 20 Jul 2009 10:46:33 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-236</link>
			<description>It is important to realize that MB/s is a key metric, but usually only in terms of infrastructure (SAN connectivity - ISLs,Trunking, etc).   In a shared environment (in particular) it’s all about IOPS and Latency.  Often times with the size of disks today, you can exceed I/O capacity long before exceeding your shared disk capacity (in GB) or your throughput capacity.  The most important metric is the one that you are likely to bottleneck on first which is why IOPS is so critical.  Thin provisioning and deduplication often exacerbates this issue since you load up more and more data into the same capacity - increasing I/O contention.  It is important to note that some performance is gained back if enough cache exists to keep frequently accessed deduped blocks resident.
  
Obviously there are one-off situations where the above is not the case.  Purely sequential workloads such as media streaming, or disk backups for instance.  

Another thing to note is that in some systems, even data that an operating system recognizes as sequential may be completely random.  For instance, with the way WAFL and ZFS take in random writes and stripe them sequentially to disk creates this interesting scenario.
 
Latency is really the measurement of performance that must always be monitored in a shared environment.  At the point that I/O contention in a given disk pool (with whatever databases or VMs that exist on it) becomes too saturated, response delay increases.  Eventually you can’t keep up with the number of transactions per second that you must achieve and are unable to meet defined SLAs.  
 - David Noonon</description>
			<pubDate>Sat, 18 Jul 2009 06:45:08 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-115</link>
			<description>OK. We're still talking about a minimum 4KB I/O, [i]8 times[/i] the disk's native block size. 

I'm also having trouble with the leap of logic that even those 4K operations are anywhere near statistically random. Especially for the case of databases such as calendar and email, I could make a very strong argument for relative adjacency and caching having strong applicability. Database software (including filesystems) works pretty hard to manage I/O efficiently, else why would have anything but native block sizes?

I'd still be very interested to see some real-world data, even summarized, since the model is not the reality. I'm not about to eat the menu :) - Max Kalashnikov</description>
			<pubDate>Wed, 24 Jun 2009 15:33:05 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-113</link>
			<description>The importance of IOPS is in relation to block-size. Even though Oracle's block sizes can be large, for many applications the record size of a DB transaction is a fraction of Oracle's block size (this isn't unique to Oracle, for that matter, Exchange is very similar). So imagine processing transactions that requires random user records of 250 byte/record. The smallest I/O size is (say) 4kilobytes, and the transaction completion time is directly related to how fast the requisite 4KB block can be brought into memory.

This is where IOPS (and response times) are important. And the two aren't necessarily the reciprocal of each other, because IOPS can be limited by the rate requests are being made by the application(s), while response time is indeed the time between the origination and completion of requests.

Said simply, MB/s is a measure of how MUCH data you can move in a period of time (usually with as few as possible very large I/O requests), while IOPS measures HOW MANY I/O requests can be serviced in a period of time (usually a very larger number of very small requests).

Think about it - you really don't care how quickly you can backup the entire Exchange server database - what you care about is how fast you can open a specific email or meeting request. The former is MB/s dependent, the latter is IOPS. - Barry A. Burke</description>
			<pubDate>Wed, 24 Jun 2009 11:31:56 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-79</link>
			<description>I'm curious by what you mean about IOPS becoming critical in a SAN versus DAS. Is this really just a concurrency issue, such as with the multiple VMs scenario?

If so, I'm quite skeptical that mere concurrency and randomness of I/O makes IOPS an interesting measurement (such as for bottlenecks), absent 512 byte operations and 100% randomness. Simple arithmetic and empirical evidence backs my suspicion.

Could you provide some data as to IOPS having been the limiting factor in a shared situation? - Max Kalashnikov</description>
			<pubDate>Wed, 10 Jun 2009 10:50:21 +0100</pubDate>
		</item>
		<item>
			<title>...</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-78</link>
			<description>Oracle (and other databases) only do synchronous journal operation, usually (it's necessary for data integrity). IOPS is hardly a problem in a DAS environment (one machine connected to one storage array) but becomes easily critical in a shared environment, either SAN or even plain old stupid NFS. 

Another tren : set up 20 or 30 VM on a RAID array, and you'll be surprised by how heavy and how random the disk activity is... The IOPS-hungry application right now isn't Oracle : it's VMWare! - Emmanuel Florac</description>
			<pubDate>Wed, 10 Jun 2009 10:31:20 +0100</pubDate>
		</item>
		<item>
			<title>Great point</title>
			<link>http://www.storagemonkeys.com/index.php?option=com_myblog&amp;show=iops-really.html&amp;Itemid=136#comment-77</link>
			<description>I've never really paid much attention to IOPS as a performance metric with storage specific to applications so I think your point is a good one. Nice post - Jan Polking</description>
			<pubDate>Sun, 07 Jun 2009 06:46:58 +0100</pubDate>
		</item>
	</channel>
</rss>
