Recently, I had a discussion with a colleague about storage performance, and he kept talking about IOPS, whereas I have always measured it with the, perhaps more traditional, bytes per second. Since IOPS is effectively the reciprocal of latency, I have tended to ignore it for disk storage, as I have yet to see any use case which is synchronous, let alone sensitive to sub-centiseond latencies.
The alleged use case is a random write-heavy Oracle instance. I confirmed with a DBA I know that Oracle's block sizes will range from 4 to 32kiB. That suggests that the worst case random I/O can't occur, as the payload for each operaton will be between 8 and 64 sectors. Still, I no longer have the data for benchmarks I ran, to be able to quantify how much of a difference this might make.
I can, however, quantify what I've observed in terms of throughput numbers. A commodity 500GB 7200RPM SATA drive can do around 100MiB/s for sustained, sequential I/O. It drops to around 10MiB/s for sustained, contentious (though not rigorously, statistically random) I/O. If it can do around 100 IOPS, the payloads must be much larger than even 64 sectors, closer to quadruple that number. Perhaps Linux scheduler queue combined with NCQ gets enough adjacency for that fourfould increase.
Back to Oracle, or, perhaps, any database, does it really perform I/O in a synchronous fashion, not even dispatching an operation until the previous one succeeded? This strikes me as unlikely, especially in a high-concurrency environment, which is what I would assume anything with many, random writes would be. Surely part of the whole point of something like intent logging is the ability to do (otherwise reckless) asynchoronous writes, and logging is patently sequential.
Regardless, both theory and empirical observation lead me to the conclusion that real-world loads and capacities are more meaningfully measured in bytes not operations per unit time. Am I missing something?

written by Jan Polking, June 07, 2009
written by Emmanuel Florac, June 10, 2009
Another tren : set up 20 or 30 VM on a RAID array, and you'll be surprised by how heavy and how random the disk activity is... The IOPS-hungry application right now isn't Oracle : it's VMWare!
written by Barry A. Burke, June 24, 2009
This is where IOPS (and response times) are important. And the two aren't necessarily the reciprocal of each other, because IOPS can be limited by the rate requests are being made by the application(s), while response time is indeed the time between the origination and completion of requests.
Said simply, MB/s is a measure of how MUCH data you can move in a period of time (usually with as few as possible very large I/O requests), while IOPS measures HOW MANY I/O requests can be serviced in a period of time (usually a very larger number of very small requests).
Think about it - you really don't care how quickly you can backup the entire Exchange server database - what you care about is how fast you can open a specific email or meeting request. The former is MB/s dependent, the latter is IOPS.
written by David Noonon, July 18, 2009
Obviously there are one-off situations where the above is not the case. Purely sequential workloads such as media streaming, or disk backups for instance.
Another thing to note is that in some systems, even data that an operating system recognizes as sequential may be completely random. For instance, with the way WAFL and ZFS take in random writes and stripe them sequentially to disk creates this interesting scenario.
Latency is really the measurement of performance that must always be monitored in a shared environment. At the point that I/O contention in a given disk pool (with whatever databases or VMs that exist on it) becomes too saturated, response delay increases. Eventually you can’t keep up with the number of transactions per second that you must achieve and are unable to meet defined SLAs.
written by David Noonon, July 20, 2009
I have found that typically bottlenecks follow this order:
1.Spindle count
2.Cache
3.Storage Processers
4.Interconnects
When it comes to actually assessing whether there is contention (not enough I/O to go around), it is response time that really provides the most value. Let’s say you have a sizable database and you find your response time to be averaging 30-40ms during the middle of the day after user complaints. You take a look at the MB/s throughput and see that the data LUN is only measuring 20MB/s. You know that an individual spindle can exceed this performance by far. You then take a look at IOPS and see that you are pushing 2500 IOPS because your transfer sizes are only averaging 8KB. Let’s also say that the read/write ratio is 80/20 and your RAID set is configured as a RAID-10. If you are on a traditional mid-range array that produces (for instance) an average of 150-160 IOPS per spindle at ~5-8ms, and you only have 14 spindles allocated instead of 20, then that is likely an issue. If you did have 20 allocated but were no longer achieving the IOPS/spindle you once were (prior to loading up the storage array with more enclosures perhaps), then you would start to take a look at cache and storage processor utilization.
It is possible to go too far the other way and get so hung up on spindle count that you forget about everything else. I’ve seen several environments where they are using well over 50% of the storage processor capacity during peak times. People often forget about the goal of being able to sustain a storage processor failure without significant performance degradation.
written by handpix, December 24, 2009
Rather than entering a technical pissing contest, maybe I can throw some light onto what IOPS indicate in more detail (some recap may occur).
iSCSI uses a TCP session between the initiator and storage controller. While mpio or multiple HBA's give multiple channels, just with direct connected SCSI, there is a single command path to the storage. While demand for access can be multiple hosts (in the case of a VM server) or multiple sectors (in a DB) at the end of the IO pipeline, is a single data path. This is similar to a router. You may have a router passing packets for hundreds of hosts, but it can only put one Ethernet frame on the wire at a time.
That being said, I view IOPS very similar to Packets Per Second in the network world. The two are very nearly inbred cousins of each other. In networking, performance numbers are typically given in Packets Per Second (PPS) even though packets can range from 32byts to 1500 bytes (or more with jumbo frames or non/Ethernet technologies, etc). This is exactly the same with storage and your arguments here. An IO could be one sector, or 100 sectors. The reason they choose PPS for performance is because the majority of the work is tied up in making routing or switching decisions that occur regardless of the packets size. This is the same with an iSCSI operation. If there is a single sector, or 10 sector read, there is an inherent amount of processing by the IO controllers and network stack, etc that must occur.
So long story short after that round of analogy and corollary, IOPS are like PPS. They are only one indication of performance. But only looking at a routers PPS count would leave you blind to other performance metrics. I only disagree with your tone that IOPS can be ignored, or are irreverent compared to other stats. This is probably because of an encounter with an uninformed or lop sided DBA in your recent past. Take all of the information you have at your disposal such as (Throughput, latency spindle performance, cache queue depth, adjacency, and yes IOPS) into account when factoring performance. If you ignore any one point, it is probably the one that will side swipe you.
Now, after all that is said, experience over the years shows me that IOP counts are usually the metric that naturally ends up giving you an alert LONG before anything else on the monitoring dashboards lights up.
That is my $0.02 worth. Cheers!
written by bragatto, July 29, 2010
David Noonon said in the previous post:
"Let’s say you have a sizable database and you find your response time to be averaging 30-40ms during the middle of the day after user complaints. You take a look at the MB/s throughput and see that the data LUN is only measuring 20MB/s. You know that an individual spindle can exceed this performance by far. You then take a look at IOPS and see that you are pushing 2500 IOPS because your transfer sizes are only averaging 8KB."
And you asked back:
"Calculating it backwards from 20MB/s and 8K transfers would mean that 20MB/s is still the more relevant measure. For that matter, where are the 8K transfers, at the spindle itself?"
First let me answer your question: the 8K transfers are being done by the DB to your disk.
Now, let's make it an equation for easier understanding:
20MB/s / 8KB = 2500 IOPS , or to make it even more clear:
20,000KB/s / 8 KB = 2500 IOPS
If I understood correctly, your point is that the "20MB/s" should be more relevant than the "2500 IOPS", just because they are related. However, the way the above is equation-ed, shows very clearly that IOPS is actually a rate in the equation and it also depends on the record size. So that single same IOPS value (2500) will let you choose among several different "MB/s" and "record size" parameters, as long when they are multiplied the result is still 2500 IOPS. For example:
40MB/s / * 16KB record = 2500 IOPS
80MB/s / * 32KB record = 2500 IOPS
160MB/s * 64KB record = 2500 IOPS
320MB/s * 128K record = 2500 IOPS
So, the same hardware that does 2500 IOPS can do a variety of different "MB/s" throughput, depending on the record size you use. And the record size is something mostly dependent on the kind of application you're using. You might have a DB doing operations around 8KB size each (which is true in most cases) or a streaming application doing 128KB per operation (also true in most cases). So on a system with 2500 IOPS, those two applications would achieve completely different throughput: 20MB/s for the DB and 320MB/s for the streaming.
Now, if you know how many IOPS your application requires and how many IOPS your hardware provides, it doesn't matter if your application does transactions of 8KB or 128KB (considering you have enough bandwidth of course), you know you can handle the amount of transactions, end of discussion.
Looking form that perspective, the throughput in MB/s is completely irrelevant. Perhaps you're just too focused on a DB application that will always have records sizes around 8KB or so, in which case the throughput in MB/s is definitely simply reciprocal to the amount of IOPS -- because in that very specific situation, you're considering a *fixed* record size, a constant -- making the other two variables directly related.
The only way to make sure you will have enough I/O for all of them is to measure how many operations they all do, regardless of the size, and see if your hardware provides with enough IOPS.
Imagine this example..... You have a Hypervisor hosting 3 Virtual Machines.. Now, each of those VMs would be different: DB server, WEB server, Live Streaming server.. Now, the DB server needs only 20MB/s, the WEB server, only 10MB/s and the Live Streaming VM needs 128MB/s... But each of those VMs would be doing transactions at completely different sizes.. The DB would do transactions using records of 8KB (total 2500 IOPS) the Web would use, lets say, mostly 16KB transactions (625 IOPS) and Live Streaming server would be doing transactions of 128KB in size (around 1000 IOPS).. Together, they all do 128+20+10 = 158MB/s and they do 2500+1000+625 = 4125 IOPS. There are plenty systems out there that can deliver 158MB/s without being capable of doing half of the required 4125 IOPS.
IOPS = Throughput / Recod Size
Throughput = IOPS * Record Size
It's a simple equation. You can't simple ignore one of the parameters and rely merely on another.


It knows nothing of LUNs, only block devices, so that's what I want to measure.



