I mean at the end of the day it really revolves around one thing, requirements. What are your requirements? This isn't always an easy question to answer as those requirements generally come from your business or other department that doesn't really have an insight into your data center technology solutions. More importantly, they don't have any idea what their storage requirements are (in my respective cases). So instead of taking the time to discuss application performance needs, workload scalability, utilization times, backups, archives, high availability etc., we revert to the capacity question.
Well how much storage do you need? STOP!!!
Capacity doesn't mean you're taking care of the big picture.
For example, I'm working with a customer that continues to deliver capacity. However, that capacity comes in the form of large capacity 7.2k drives. When I throw a workload at it, I'm generating a metric ton of latency (specifically closer to 6k CMD/S latency if we're looking at ESXTOP).
When I present my problem to the on-site IT staff, they're baffled that I'm generating that much workload. Is there a disconnect or did someone take my request for storage from a different perspective? Where they thought the issue was capacity, the issue is really around performance.
What am I talking about here? The issue centers around one area of focus IOPS.
IOPS or Inputs/Outputs per second is basically a method of identifying how much read and write requests are generated per second on your storage. To have a high number of IOPS you need multiple disks. The number of IOPS is generated by the type of disk speed leveraged.
For example a 7.2k RPM spinning disk is about 75 IOPS, 10K about 125, 15k about 175. (These are rough estimates so take them with a grain of salt)
SSD doesn't use a motor or spinning platters so they don't generate RPMs but they are flash based technologies so they generate a lot of IOPS because operations run as memory.
IOPS are a little bit more complicated to calculate but they generally start in the thousands (they vary by technology used in the disk and by vendor). This is one of the big reasons why SSDs are so much more expensive than spinning disks. This is also why when you go to a retailer and look up hard drives for your computer or server that the higher the RPM the more expensive (when compared with drives of the same capacity).
Scott Lowe did an article for TechRepublic that does a much better job explaining the spinning disk IOP calculation requirements so I'll include them here:
IOPS calculations
Every disk in your storage system has a maximum theoretical IOPS value that is based on a formula. Disk performance -- and IOPS -- is based on three key factors:
- Rotational speed (aka spindle speed). Measured in revolutions per minute (RPM), most disks you'll consider for enterprise storage rotate at speeds of 7,200, 10,000 or 15,000 RPM with the latter two being the most common. A higher rotational speed is associated with a higher performing disk. This value is not used directly in calculations, but it is highly important. The other three values depend heavily on the rotational speed, so I've included it for completeness.
- Average latency. The time it takes for the sector of the disk being accessed to rotate into position under a read/write head.
- Average seek time. The time (in ms) it takes for the hard drive's read/write head to position itself over the track being read or written. There are both read and write seek times; take the average of the two values.
Scott Lowe is a revered technologist and blogger/author from EMC. He wrote the Mastering vSphere books. I highly respect his work and knowledge.
When you take these technologies and expand them across many disks, you get an IOP calculation.
Now if you want to do this yourself, I recommend looking into "The Cloud Calculator"
This calculator takes the capacity, speed, disk count, and read/write percentages and factors them into into various RAID groups (common groups are RAID 5, 10, and 6) and calculates total IOPS based on those pieces of detail.
THIS IS HOW YOU SHOULD GATHER YOUR STORAGE REQUIREMENTS!!!!
However, it's not that simple I'm afraid. You have to understand the performance workload needs of your environment to quantify the read and write activity. For example, databases high a high read capacity if you have a lot of connections into that database reading information from it (i.e. queries). Other databases have a lot of write activity (i.e. committing changes). Generally speaking, your application will have some detailing requirements about how the environment interacts and whether you need higher end storage or not.
Use cases like E-mail (Microsoft Exchange) restructured their architecture so you can use JBOD (Just a bunch of disks) to deliver e-mail solution on the cheap. Exchange doesn't need high end storage but a large quantity of disks running at slower speeds.
Other use cases like Data Center Virtualization or Virtual Desktops require more than JBOD. This is where companies like EMC, NetApp, Pure Storage, Tegile, Dell, HP, etc. love to talk about their storage arrays. Whether they are all flash, or a hybrid of flash and spinning disks. They're tailored to your organizational need.
Hopefully, next time you look at your infrastructure requirements you factor the storage performance needs and expectations before you immediately address your capacity requirement.
Just like Austin Powers used to say... "It's not the size that matters... it's how you use it..."
Thanks for reading and thanks for your time.