ludodesign - Fotolia

Storage performance metrics: Five key areas to look at

We look at the essentials in storage performance metrics: capacity, throughput and read/write capability, IOPS and latency, and hardware longevity measured by failure rates

Stephen Pritchard

Published: 16 Jun 2021

Assessing any investment in storage is a question of balancing cost, performance and capacity.

But with the growth of solid-state storage and cloud data storage services, evaluating storage systems can be complex. Nonetheless, there are key storage performance metrics and definitions that IT teams can use to simplify comparisons between technologies and suppliers.

We look at some of the most useful storage performance metrics – capacity; throughput and read/write; input/output operations per second and latency; mean time between failures and terabytes written; form factors and connectivity – some of which are primarily of use for assessing on-premise storage while others also apply to the cloud. We will cover cloud-specific storage performance metrics in a future guide.

1. Storage capacity metrics

All storage systems have a capacity measurement. Storage hardware today is largely measured in gigabytes (GB), or terabytes (TB). Older systems measured in megabytes (MB) have largely fallen out of use, though megabytes is still a useful metric in areas such as cache memory.

One gigabyte of storage is 1,000MB, and a terabyte is 1,000GB. Petabytes (PB) contain 1,000TB of data, and large storage systems are often referred to as working at “petabyte scale”. A petabyte of storage is enough to host an MP3 file that will play for 2,000 years.

It is worth noting that although most storage suppliers round capacities to the nearest thousand, based on kilobytes of data, some systems use units based on the power of two. By this definition, a kibibyte (kiB) is 1,024 or 2¹⁰ bytes, a mebibyte (MiB) is 1024²bytes and a gibibyte (GiB) is 1024³bytes. Fortunately, only the decimal system, using powers of 10, applies from terabytes and upwards.

Storage capacities can apply to individual drives or solid-state subsystems, hardware arrays, volumes or even system-wide capacity, such as on a storage area network, or the provisioned storage in a cloud instance.

2. Throughput and read/write storage metrics

Raw storage is of little use unless the data can be moved in or out of a central processing unit (CPU) or other processing system.

Throughput measures the number of bits a system can read or write per second. Solid-state systems, in particular, will have different read and write speeds, with write speeds typically lower.

The application will determine the most important metric of the two. For example, an application such as an industrial camera will need storage media with fast write speeds whereas an archival database will be more focused on reads.

However, suppliers might use calculations based on average block sizes to market their systems. This can be misleading. Calculating throughput (or IOPS, see below) based on either an “average” or a small block size will give a very different set of values to the same system’s performance under real-world workloads.

Manufacturers also distinguish between random and sequential read and write speeds. The sequential read or write speed is how quickly a given storage device can read, or write, a series of blocks of data.

This is a useful measure for large files or series of data, such as a video stream or a backup. Random read and write is often a more realistic guide to real-world performance, especially for local storage on a PC or server. SSDs should have a stronger performance advantage over spinning disks for random read and write.

3. IOPS and latency storage metrics

Input/output operations per second (IOPS) is another “speed” measurement. The higher the IOPS, the better the performance of the drive or storage system. A typical spinning disk has IOPS in the range of 50 to 200, although this can be improved significantly with RAID and cache memory. SSDs will be 1,000 times or more faster. Higher IOPS does, however, mean higher prices.

IOPS measurements will also vary with the amount of data being written or read, as is also the case for throughput.

Latency is how quickly the input/output (I/O) request is carried out. Some analysts advise that latency is the most important metric for storage systems, in terms of real-world application performance. The Storage Network Industry Association (SNIA) describes it as “the heartbeat of a solid-state disk”.

The latency for a hard disk drive (HDD) system should be between 10ms and 20ms (milliseconds). For solid-state storage, it should be just a few milliseconds. In practical terms, applications will expect about 1ms latency.

4. MTBF and TBW

Mean time between failures (MTBF) is a key reliability metric across most of industry, including IT.

For storage devices, this will usually mean the number of powered-on hours it will operate before failure. Failure in the case of storage media will normally mean data recovery and replacement, because drives are not repairable. Storage subsystems such as RAID arrays will have a different MTBF, because drives can be replaced.

A hard drive might have a typical MTBF of 300,000 hours, although newer technologies mean this can range up to 1,200,000 hours or 120 years of operation.

Some manufacturers are moving away from MTBF. Seagate now uses the metric Annualized Failure Rate (AFR), which sets out to predict the percentage of drives that will fail in the field in a given year due to a “supplier cause” (so excluding customer-side issues, such as damage from a power outage).

Solid-state storage systems, with their different physical characteristics, are also measured by endurance. Total terabytes written (TWB) over time sets out the lifespan of a solid-state drive (SSD). Drive writes per day (DWPD) is based on how many times the entire drive can be rewritten over its life. Manufacturers will usually state these metrics in their hardware warranties.

Endurance will vary by flash generation. Single-level cell (SLC) SSDs have generally been the most durable, with multi-level cell (MLC), triple-level cell (TLC) and quad-level cell (QLC) packing more activity into smaller cells and trading durability for capacity. However, manufacturing techniques have improved the durability of all flash types through technologies such as enterprise multi-level cell (eMLC) designs.

5. Form factors and connectivity

Although not performance metrics per se, storage buyers will need to consider how equipment connects to the host system and shares data.

The typical form factor for laptops, now also common in storage arrays, is the 2.5in SSD, although larger, 3.5in drive bays remain available for HDDs. These drives use Serial ATA (SATA) or, for enterprise applications, SAS interfaces.

M.2 uses a PCI Express Mini Card format to interface with the host hardware. U.2 connectors are more commonly used on 2.5in SSDs, and unlike M.2, they can be hot-swapped.

NVMe is an interface allowing storage, usually Nand flash, to connect to a host’s PCIe bus; U.2 devices can also use the NVMe interface.

Storage performance metrics: Five key areas to look at

We look at the essentials in storage performance metrics: capacity, throughput and read/write capability, IOPS and latency, and hardware longevity measured by failure rates

1. Storage capacity metrics

2. Throughput and read/write storage metrics

3. IOPS and latency storage metrics

4. MTBF and TBW

5. Form factors and connectivity

Read more about storage performance

Read more on Computer storage hardware

IOPS (input/output operations per second)

A primer on SSD response time, other performance benchmarks

How to effectively compare storage system performance

Use Amazon CloudWatch metrics to monitor Aurora