Recently, I had a discussion with a colleague about storage performance, and he kept talking about IOPS, whereas I have always measured it with the, perhaps more traditional, bytes per second. Since IOPS is effectively the reciprocal of latency, I have tended to ignore it for disk storage, as I have yet to see any use case which is synchronous, let alone sensitive to sub-centiseond latencies.
The alleged use case is a random write-heavy Oracle instance. I confirmed with a DBA I know that Oracle's block sizes will range from 4 to 32kiB. That suggests that the worst case random I/O can't occur, as the payload for each operaton will be between 8 and 64 sectors. Still, I no longer have the data for benchmarks I ran, to be able to quantify how much of a difference this might make.
I can, however, quantify what I've observed in terms of throughput numbers. A commodity 500GB 7200RPM SATA drive can do around 100MiB/s for sustained, sequential I/O. It drops to around 10MiB/s for sustained, contentious (though not rigorously, statistically random) I/O. If it can do around 100 IOPS, the payloads must be much larger than even 64 sectors, closer to quadruple that number. Perhaps Linux scheduler queue combined with NCQ gets enough adjacency for that fourfould increase.
Back to Oracle, or, perhaps, any database, does it really perform I/O in a synchronous fashion, not even dispatching an operation until the previous one succeeded? This strikes me as unlikely, especially in a high-concurrency environment, which is what I would assume anything with many, random writes would be. Surely part of the whole point of something like intent logging is the ability to do (otherwise reckless) asynchoronous writes, and logging is patently sequential.
Regardless, both theory and empirical observation lead me to the conclusion that real-world loads and capacities are more meaningfully measured in bytes not operations per unit time. Am I missing something?

written by Jan Polking, June 07, 2009
written by Emmanuel Florac, June 10, 2009
Another tren : set up 20 or 30 VM on a RAID array, and you'll be surprised by how heavy and how random the disk activity is... The IOPS-hungry application right now isn't Oracle : it's VMWare!
written by Barry A. Burke, June 24, 2009
This is where IOPS (and response times) are important. And the two aren't necessarily the reciprocal of each other, because IOPS can be limited by the rate requests are being made by the application(s), while response time is indeed the time between the origination and completion of requests.
Said simply, MB/s is a measure of how MUCH data you can move in a period of time (usually with as few as possible very large I/O requests), while IOPS measures HOW MANY I/O requests can be serviced in a period of time (usually a very larger number of very small requests).
Think about it - you really don't care how quickly you can backup the entire Exchange server database - what you care about is how fast you can open a specific email or meeting request. The former is MB/s dependent, the latter is IOPS.
written by David Noonon, July 18, 2009
Obviously there are one-off situations where the above is not the case. Purely sequential workloads such as media streaming, or disk backups for instance.
Another thing to note is that in some systems, even data that an operating system recognizes as sequential may be completely random. For instance, with the way WAFL and ZFS take in random writes and stripe them sequentially to disk creates this interesting scenario.
Latency is really the measurement of performance that must always be monitored in a shared environment. At the point that I/O contention in a given disk pool (with whatever databases or VMs that exist on it) becomes too saturated, response delay increases. Eventually you can’t keep up with the number of transactions per second that you must achieve and are unable to meet defined SLAs.
written by David Noonon, July 21, 2009
I have found that typically bottlenecks follow this order:
1.Spindle count
2.Cache
3.Storage Processers
4.Interconnects
When it comes to actually assessing whether there is contention (not enough I/O to go around), it is response time that really provides the most value. Let’s say you have a sizable database and you find your response time to be averaging 30-40ms during the middle of the day after user complaints. You take a look at the MB/s throughput and see that the data LUN is only measuring 20MB/s. You know that an individual spindle can exceed this performance by far. You then take a look at IOPS and see that you are pushing 2500 IOPS because your transfer sizes are only averaging 8KB. Let’s also say that the read/write ratio is 80/20 and your RAID set is configured as a RAID-10. If you are on a traditional mid-range array that produces (for instance) an average of 150-160 IOPS per spindle at ~5-8ms, and you only have 14 spindles allocated instead of 20, then that is likely an issue. If you did have 20 allocated but were no longer achieving the IOPS/spindle you once were (prior to loading up the storage array with more enclosures perhaps), then you would start to take a look at cache and storage processor utilization.
It is possible to go too far the other way and get so hung up on spindle count that you forget about everything else. I’ve seen several environments where they are using well over 50% of the storage processor capacity during peak times. People often forget about the goal of being able to sustain a storage processor failure without significant performance degradation.


It knows nothing of LUNs, only block devices, so that's what I want to measure.





