Wednesday, June 10, 2015

Why are we having the same storage discussion???


I feel that despite the efforts and the advancements in the data center and virtualization space (specifically around storage), that many IT professionals are still having a tough time coming to grasp with capacity over performance.

I mean at the end of the day it really revolves around one thing, requirements. What are your requirements? This isn't always an easy question to answer as those requirements generally come from your business or other department that doesn't really have an insight into your data center technology solutions. More importantly, they don't have any idea what their storage requirements are (in my respective cases). So instead of taking the time to discuss application performance needs, workload scalability, utilization times, backups, archives, high availability etc., we revert to the capacity question.

Well how much storage do you need? STOP!!!

Capacity doesn't mean you're taking care of the big picture.

For example, I'm working with a customer that continues to deliver capacity. However, that capacity comes in the form of large capacity 7.2k drives. When I throw a workload at it, I'm generating a metric ton of latency (specifically closer to 6k CMD/S latency if we're looking at ESXTOP).

When I present my problem to the on-site IT staff, they're baffled that I'm generating that much workload. Is there a disconnect or did someone take my request for storage from a different perspective? Where they thought the issue was capacity, the issue is really around performance.

What am I talking about here? The issue centers around one area of focus IOPS.

IOPS or Inputs/Outputs per second is basically a method of identifying how much read and write requests are generated per second on your storage. To have a high number of IOPS you need multiple  disks. The number of IOPS is generated by the type of disk speed leveraged.

For example a 7.2k RPM spinning disk is about 75 IOPS, 10K about 125, 15k about 175. (These are rough estimates so take them with a grain of salt)

SSD doesn't use a motor or spinning platters so they don't generate RPMs but they are flash based technologies so they generate a lot of IOPS because operations run as memory.

IOPS are a little bit more complicated to calculate but they generally start in the thousands (they vary by technology used in the disk and by vendor). This is one of the big reasons why SSDs are so much more expensive than spinning disks. This is also why when you go to a retailer and look up hard drives for your computer or server that the higher the RPM the more expensive (when compared with drives of the same capacity).

Scott Lowe did an article for TechRepublic that does a much better job explaining the spinning disk IOP calculation requirements so I'll include them here:

IOPS calculations

Every disk in your storage system has a maximum theoretical IOPS value that is based on a formula. Disk performance -- and IOPS -- is based on three key factors:
  • Rotational speed (aka spindle speed). Measured in revolutions per minute (RPM), most disks you'll consider for enterprise storage rotate at speeds of 7,200, 10,000 or 15,000 RPM with the latter two being the most common. A higher rotational speed is associated with a higher performing disk. This value is not used directly in calculations, but it is highly important. The other three values depend heavily on the rotational speed, so I've included it for completeness.
  • Average latency. The time it takes for the sector of the disk being accessed to rotate into position under a read/write head.
  • Average seek time. The time (in ms) it takes for the hard drive's read/write head to position itself over the track being read or written. There are both read and write seek times; take the average of the two values.
To calculate the IOPS range, use this formula: Average IOPS: Divide 1 by the sum of the average latency in ms and the average seek time in ms (1 / (average latency in ms + average seek time in ms).

Scott Lowe is a revered technologist and blogger/author from EMC. He wrote the Mastering vSphere books. I highly respect his work and knowledge.

When you take these technologies and expand them across many disks, you get an IOP calculation.

Now if you want to do this yourself, I recommend looking into "The Cloud Calculator"

This calculator takes the capacity, speed, disk count, and read/write percentages and factors them into into various RAID groups (common groups are RAID 5, 10, and 6) and calculates total IOPS based on those pieces of detail.

THIS IS HOW YOU SHOULD GATHER YOUR STORAGE REQUIREMENTS!!!!


However, it's not that simple I'm afraid. You have to understand the performance workload needs of your environment to quantify the read and write activity. For example, databases high a high read capacity if you have a lot of connections into that database reading information from it (i.e. queries). Other databases have a lot of write activity (i.e. committing changes). Generally speaking, your application will have some detailing requirements about how the environment interacts and whether you need higher end storage or not.

Use cases like E-mail (Microsoft Exchange) restructured their architecture so you can use JBOD (Just a bunch of disks) to deliver e-mail solution on the cheap. Exchange doesn't need high end storage but a large quantity of disks running at slower speeds. 

Other use cases like Data Center Virtualization or Virtual Desktops require more than JBOD. This is where companies like EMC, NetApp, Pure Storage, Tegile, Dell, HP, etc. love to talk about their storage arrays. Whether they are all flash, or a hybrid of flash and spinning disks. They're tailored to your organizational need.

Hopefully, next time you look at your infrastructure requirements you factor the storage performance needs and expectations before you immediately address your capacity requirement.

Just like Austin Powers used to say... "It's not the size that matters... it's how you use it..." 


Thanks for reading and thanks for your time. 

Monday, March 23, 2015

VMUG UserCon 2015 - St. Louis

The St. Louis area VMUG UserCon event was held last week at the Hyatt Regency at the Arch in downtown St. Louis. This one day event allowed numerous vendors to come and speak about the trending topics in the IT space.

Overall, I felt this event lacked diversity within the vendor space but given the notice and changes, the VMUG leadership putting this event together did an amazing job.

The overall theme seemed to go between hyper-converged infrastructures and flash based storage technology providers (I counted six companies offering flash based arrays).

The two biggest nods go to Simplivity and NUTANIX in my opinion. Both offering their own 2U platform that provides both compute, network, and storage capabilities. However, Simplivity impressed me slightly more by offering a UCS platform running on a C240 server that could be managed by UCS Manager (a later release being managed by UCS director is forthcoming).

With the costs of convergence being extremely high with big name vendors and OEMs like EMC, Dell, NetApp, etc. companies like Simplivity and NUTANIX shine with a product that can deliver the high capacity that your organization needs with performance and cost in mind. This will only go down in cost as the costs of storage disk based technologies (specifically those around flash based storage) decrease over time.

All in all, I was really impressed with the event and the attendance. I was told over 400 attended by a VMUG leader coordinating the event. Huge thanks go out to them and everyone else responsible for the setup and coordination of that event.

Below is some additional information about hyper-convergence (i.e. software defined data centers) from Matthew Brisse at Gartner that I found interesting and I think is a great read and guide.

Gartner Asset

  1. Define the application use cases and entry points for SDDC. Select use cases and entry points for SDDC such as self-service provisioning of IT infrastructure resources in support of cloud-based applications, improved IT and business process automation. Note: Software-defined security and policy-based orchestration can reside with a cloud management solution above the SDDC layer.
  2. Identify the abstracted infrastructure layers required by application and process use cases. Expose the layer of abstraction and virtualization requirements for storage, networking, compute and facilities components. Define infrastructure implementation requirements based on application and process pain points. Administrator-based requirements may be focused on a single data center pain point such as storage provisioning, while applications often require multiple data center technologies to be abstracted for end-to-end provisioning. Note that SDDC is an optional enablement architecture, and as such, not every component has to be abstracted or virtualized.
  3. Define abstracted/virtual infrastructure service. Define detailed data center services based on application and process requirements, not on current infrastructure capabilities. Examples of storage services can include provisioning, thick or thin logical unit number (LUN) assignments, snapshots, replication, cloning, and data deduplication or other data services. Implement services providing the greatest value to speed of execution with increased agility in support of cloud services provisioning and automation.
  4. Perform an infrastructure assessment focusing on use case requirements and data center services. Determine if infrastructure or architecture alternatives can fill gaps based on the ability to deliver abstraction, instrumentation, programmability (API), automation, policy-based management and orchestration capabilities. Hyper-converged integrated systems may be leveraged for faster time to value and increased agility. Facility-based operational technologies such as monitors and sensors should be integrated as part of workload placement based on power, temperature and other sensor metrics.
  5. Define abstracted/virtual policies. Define policies for infrastructure services and process requirements leveraging northbound and southbound APIs, policies and automation. Test and implement API interoperability extensively because the lack of SDDC maturity could see API instability impacting data center operations.
  6. Implement software-defined data center components. Implement the software-defined components based on use case. Test northbound and southbound APIs associated with the control and data planes to ensure infrastructure interoperability. Validate that policies and services can be automated within each infrastructure component. Perform IT service continuity and disaster recovery testing for each software-defined component.
  7. Integrate software-defined security. Identify and standardize on well-established processes and patterns that have to be secured throughout the entire SDDC environment. Each data center component will have infrastructure-specific security that must be orchestrated through API or scripting to ensure interoperability and workflow processes. Programmatically enforce security-based policies to ensure the workflow models are enforced across the infrastructure layers.
  8. Integrate policy-based orchestration and management. Select and implement an overarching policy-based solution to provide management and infrastructure orchestration. Policy-based orchestration and security requirements may be provided by a cloud management layer that resides above the SDDC. For example, OpenStack can help orchestrate the infrastructure by enabling a standard set of APIs and providing templates for common tasks.