Storage Performance Baseline with Diskspd

 

Deploying a new system requires a rigorous process in order to ensure stability and performance. We sometimes receive pressure from the business to deploy new systems quickly while unfortunately skipping the crucial steps of stress testing and baselining. In order to keep the cycle time of that process to a minimum, automation is key. The more thorough your checklist is, the less chance for surprises once your system is up and running with mission critical workload. Here’s a simple process I like to use in my environment that helps me ensure the deployed hosts are as reliable as possible while delivering predictable performance:

  1. Power Resiliency Test
  2. CPU Stress Tests (ex: y-cruncher)
  3. RAM Tests
  4. Network Tests (ex: NTttcp)
    1. Resiliency Tests
    2. Performance Tests
  5. Storage Tests
    1. Resiliency Tests
    2. Integrity Tests
    3. Performance Tests

The goal of this particular post is to focus on the storage performance testing piece that goes into preparing a new system for rollout. In order to achieve this, we will use Microsoft’s preferred storage testing utility, diskspd. Some of you might have used SQLIO in the past to achieve this but Microsoft is now officially recommending the usage of diskspd for most storage testing. The only exception that comes to mind at this point in time is Exchange. After a discussion with a PFE recently, he advised to still use JetStress.

You can obtain the compiled Diskspd on TechNet here. If you want the absolute latest version, you will need to download the source code from the GitHub site here and compile the code yourself.

At a high level, testing your storage involves the following high level process:

  1. Understand the targeted storage platform
  2. Understand the workload
  3. Implement the test cases for execution with diskspd
  4. Run test cases with diskspd
  5. Analyze the result
  6. Tweak and retest  the storage platform if necessary
  7. Keep the final results of the tests for future reference

After covering this process, the post will then provide a concrete example of how you would tackle IO baselining for a SQL Server workload.

Understand the targeted storage platform

The first step in planning a good storage test suite for your storage is to understand it. This is to ensure that you will pick the right tests to execute, which in turn will highlights some of the characteristics or quirks in your storage platform. For instance, if you are unaware of the size of your read cache and you end up using a test file that fits within the cache completely, you will obtain results that will be significantly higher than what will happen in real life as your data set most likely the size of the read cache. You will then deploy your workload in production and end up with a bad surprise performance wise. To give you a sense of what you might want to consider while reviewing your storage platform, here are some of the things that will influence how you will be testing and the test results you will obtain:

  • Number of disks available to your workload
  • The types of the disks in your setup
    • Spinning
    • Solid State: SAS/SATA, PCIe or NVMe?
  • The capabilities of each drive type/model in your setup
  • The capacity targeted
  • Is caching or automated block tiering used?
  • How big are the caches or each of the tiers?
  • Is deduplication involved? If that’s the case, is it an inline or post-process?
  • Is compression enabled?
  • Is replication configured? If that’s the case is it a synchronous or an asynchronous process?
  • What kind of connectivity do you have to your storage?
    • Is it directly attached SAS?
    • Is it fibre channel?
    • Is it Ethernet based? (i.e. iSCSI, SMB, NFS)
    • If it’s Ethernet based is RDMA involved? i.e. SMB Direct
    • Is multi-pathing involved? i.e. SMB Multichannel or MPIO
    • How is multi-pathing used? Is the traffic load balanced or the other paths only used for failover purposes?
    • What kind of bandwidth is available to connect to the storage? 1Gbps/10Gbps/100Gbps Ethernet? 4G/8G/16G Fibre Channel? 6G/12G SAS? PCIe 4x/8x/x16?
    • If applicable, how much network latency is there between compute and the storage server/appliance?
  • How will the storage devices be configured from a fault tolerance standpoint?
    • What RAID level (or equivalent) will be used? RAID 10 performs differently from a RAID 5 for instance.
    • What stripe or interleave size will be used when configuring the logical disk?
  • How will the volume be formatted? Which file system will be used? What file system allocation unit size will be used?
  • If you are using a server based storage solution:
    • What kind of processors are installed in the storage system? (number of cores, clock speed, CPU generation/model)
    • How much memory is installed in the server?

As you can see, there’s quite a bit already to think about when designing a good test suite for your storage. Don’t worry, being thorough in all steps of the process does pay off in the end!

Understand the workload

Again, in the spirit of selecting the right tests to perform against your storage platform, you need to have a good understanding of the IO patterns of the applications that will run on top of it. You need to identify the following:

  • Is the IO of the application random or sequential?
  • What are the various IO block sizes generated by the application?
    • For instance, SQL Server can generate IO read operations ranging from 512 bytes to 128K in block sizes
  • Does the application have multiple threads to perform the IO? Is IO being performed concurrently on multiple files by different processes or threads?
  • What is your tolerance to IOs being queued or backlogged?
  • How big is your active data set?
    • If only 5GB out of 100GB is actively used in a SQL Server database, that will influence the design of your storage infrastructure, especially when caching or automated tiering is used.

Implement the test cases for execution with diskspd

Once you have a solid understanding of the IO generated by your application, it’s now time to translate this into tests that can be run using diskspd.

Run test cases with diskspd

This step in the process should be straightforward. It boils down to launching the script built in the previous step and wait for all the test cases to complete.

Analyze the results

Once you have run the tests using diskspd. You can now start diving into the results to make sense of all the data generated by diskspd. While doing so, you might want to keep an eye on a few important things to assess the performance of your storage subsystem.

  • Latency: The amount of time between the time an IO operation is initiated and the time when the IO is completed
  • Input/Output Operations Per Second (IOPS): The number of IO operations performed per second
  • Transfer rate (i.e. MB/s): How much data is being transferred per second

In other words, IOPS x IO Block Size=Transfer Rate. An important thing to understand, if your latency increases by 50% (i.e. from 10ms to 15ms) your IOPS should drop in half as all IOs are taking 50% longer to complete. There’s usually a lot of confusion about reporting the performance of a storage subsystem in IOPS as sometimes the IO block size is not mentioned. 100 IOPS at 1K and 100 IOPS at 4K does not deliver the same performance level.

Adjust and retest  the storage platform if necessary

While testing your storage platform, you will most likely discover things that are not working as expected. This will result in configuration changes along the way. That’s fine, this is when you want to discover those things, not when the system is live with your users on it! The key things to understand here are the following:

  • Make changes incrementally
  • Test after each change to validate its effect

The goal here is to be methodical. If you introduce multiple changes at the same time, you run into the possibility that one of those changes causes a regression in performance. You don’t want to be in a position where one of the changes that improves performance by 10% is being cancelled by another that reduces it by 12% unexpectedly.

Keep the final results of the tests for future reference

This is probably the most straightforward part of the process but that doesn’t mean it’s not an important one. Keep your test results file in a place you will remember and/or that is easily discoverable by you and your peers. The goal here is to be able to pull those numbers quickly in the event of a performance issue. When that happens, simply re-run your automated performance tests and compare with the previous results. You’ll then be in a position to determine if there was some sort of performance degradation over time of your storage. If you don’t have that reference point, it will be impossible to determine this!

Example: SQL Server Performance Baselining

I will now guide you through a simple example of how you would apply the process above. Some of the SQL Server gurus out there will most likely find my example an oversimplification of what needs to be tested but I’m purposefully keeping the number of test cases short to lighten the reading of this post.

Understand the targeted storage platform

In our example, let’s consider the following to keep things simple:

  • There are two SATA drives supporting the volume, 1 SSD and one spinning media each delivering 50000 IOPS and 100 IOPS respectively as per the specs
  • The disks are directly attached to the server
  • The particular computer has 4 cores with Hyper Threading enabled

Understand the workload

The main application that will run on the SQL Server instance is an OLTP type application. This means the IOs are mostly random reads with a bit of sequential writes for log operations. This means SQL Server will generate the following types of IOs:

  • Log writes can vary between 512 bytes and 64KB. For the sake of simplicity we’ll only do 4K (typical cluster size) and 64KB.
  • Standard reads can range between 8KB and 128KB. For the sake of simplicity we’ll only do the maximum and minimum IO size, 8KB and 128KB.
  • The SQL Server instances leverages both cores that were assigned to the VM
  • The database is expected to be around 100G in size

Implement the test cases for execution with diskspd

Based on our understanding of the workload and the hardware, we need to create diskspd tests that will generate random reads that are typical to SQL Server and sequential write tests to accommodate the log operations IO. This translates to the following tests:

  • 512B sequential write using 2 threads with 1 outstanding IO
  • 64KB sequential write using 2 threads with 1 outstanding IO
  • 8KB random read using 2 threads with 1 outstanding IO
  • 128KB random read using 2 threads with 1 outstanding IO

In diskspd lingo, that would look like the following:

diskspd.exe -c100G -t2 -si4K -b4K -d30 -L -o1 -w100 -D -h H:testfile.dat > 4K_Sequential_Write_2Threads_1OutstandingIO.txt
diskspd.exe  -t2 -si64K -b64K -d30 -L -o1 -w100 -D -h H:testfile.dat > 64KB_Sequential_Write_2Threads_1OutstandingIO.txt
diskspd.exe -r -t2 -b8K -d30 -L -o1 -w0 -D -h H:testfile.dat > 8KB_Random_Read_2Threads_1OutstandingIO.txt
diskspd.exe -r -t2 -b128K -d30 -L -o1 -w0 -D -h H:testfile.dat > 128KB_Random_Read_2Threads_1OutstandingIO.txt

Here’s a brief explanation of the switch being used here:

-c: This creates a 100GB test file. Note this switch is only used in the first test as all subsequent tests will reuse the file created by the first test.
-si : This specifies the interlocked increment, which is required when performing sequential IO tests using multiple threats to ensure the IO is as sequential as possible.
-t : The number of threads that will be used to perform IO
-b : The size of the IO that will be performed
-d : The duration of the test
-L : Enables latency measurement
-o : The number of outstanding IO in the queue while testing
-w : The percentage of write that will be performed during the test
-D : Captures IOPs statistics in the specified interval, by default this is every second
-h : Disables hardware and software caching. It’s usually best practice to disable hardware and software caching while testing storage for SQL Server workload because of the way it does IO.
d:testfile.dat : The path of the data file in which the IO will be performed
> “Name of file”.txt : Simply redirecting the output of diskspd to a text file for future analysis and reference

Run test cases with diskspd

Once your diskspd commands have been prepared and save in a batch or PowerShell script file, simply run the tests. I recommend running the test a few times to let the

Analyze the result

Now that we ran all of our tests, it’s now time to make sense of the results. In order to do this, we’ll look at each of the section of the output individually.

Command Line: C:CodeplexGitGEMAutomationInfrastructureTestingdiskspd.exe -a0,1 -t2 -si4K -b4K -d30 -L -o1 -w100 -D H:testfile.dat

Input parameters:

	timespan:   1
	-------------
	duration: 30s
	warm up time: 5s
	cool down time: 0s
	measuring latency
	calculating IOPS stddev with bucket duration = 1000 milliseconds
	random seed: 0
	advanced affinity: 0, 1
	path: 'H:testfile.dat'
		think time: 0ms
		burst size: 0
		using software and hardware write cache
		performing write test
		block size: 4096
		using interlocked sequential I/O (stride: 4096)
		number of outstanding I/O operations: 1
		thread stride size: 0
		threads per file: 2
		IO priority: normal

This section is useful to look at exactly which test was being run for this particular result file. We can see where the test file was located, the IO block size used, the number of threads if the test was a read or a write test and the test duration.

Results for timespan 1:
*******************************************************************************

actual test time:	30.01s
thread count:		2
proc count:		8

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  17.39%|   2.29%|   15.10%|  82.57%
   1|  18.01%|   4.53%|   13.48%|  81.90%
   2|   6.87%|   1.93%|    4.95%|  93.09%
   3|   1.30%|   0.52%|    0.78%|  98.71%
   4|   8.38%|   0.73%|    7.65%|  91.58%
   5|   4.06%|   0.52%|    3.54%|  95.85%
   6|   2.81%|   0.57%|    2.24%|  97.15%
   7|  36.65%|   1.09%|   35.56%|  63.31%
-------------------------------------------
avg.|  11.94%|   1.52%|   10.41%|  88.02%

This is the first section that starts to show interesting data. Here you can see how many threads were used but also how many processors were available at the time of the test. Following this information, you can see a summary of the CPU usage while the test was running. In this particular test run on my PC, you can see the two cores where the threads were running were a little bit busier than the other ones. As you can see, there’s still plenty of headroom CPU wise to handle the IO. If the application is CPU  intensive on the SQL Server instance, you might have to pay special attention to this as SQL Server might fight for the CPU with the IO threads.

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | IopsStdDev | LatStdDev |  file
------------------------------------------------------------------------------------------------------------------
     0 |      2371416064 |       578959 |      75.36 |   19291.76 |    0.052 |    3995.09 |     3.676 | H:testfile.dat (100GB)
     1 |      2330673152 |       569012 |      74.06 |   18960.31 |    0.053 |    3924.21 |     3.706 | H:testfile.dat (100GB)
------------------------------------------------------------------------------------------------------------------
total:        4702089216 |      1147971 |     149.42 |   38252.08 |    0.052 |    7627.01 |     3.691

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | IopsStdDev | LatStdDev |  file
------------------------------------------------------------------------------------------------------------------
     0 |               0 |            0 |       0.00 |       0.00 |    0.000 |       0.00 |       N/A | H:testfile.dat (100GB)
     1 |               0 |            0 |       0.00 |       0.00 |    0.000 |       0.00 |       N/A | H:testfile.dat (100GB)
------------------------------------------------------------------------------------------------------------------
total:                 0 |            0 |       0.00 |       0.00 |    0.000 |       0.00 |       N/A

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | IopsStdDev | LatStdDev |  file
------------------------------------------------------------------------------------------------------------------
     0 |      2371416064 |       578959 |      75.36 |   19291.76 |    0.052 |    3995.09 |     3.676 | H:testfile.dat (100GB)
     1 |      2330673152 |       569012 |      74.06 |   18960.31 |    0.053 |    3924.21 |     3.706 | H:testfile.dat (100GB)
------------------------------------------------------------------------------------------------------------------
total:        4702089216 |      1147971 |     149.42 |   38252.08 |    0.052 |    7627.01 |     3.691

Now we’re getting more into the meat of the subject. The three tables above show the Total of both read and write IO, only read IO and only write IO. Let’s look at each of the columns.

  • thread: The number of the thread that is generating IO
  • bytes: The total number of bytes transferred for the test
  • I/Os: The total number of IO operations performed for the test
  • MB/s: The throughput in MB per second
  • I/O per s: The number of IO operations per second
  • AvgLat: The average latency of all the IO operations for the test
  • IopsStdDev: The standard deviation of the IO operations per second
  • LatStdDev: The standard deviation of the latency encountered for the test
  • File: The path of the file used in the IO test

If we look at the total section, you can see that we reached a total of 149MB/s giving us 38252 IO/s. If you take 38252 IOPS x 4K (the IO block size), that gives us 149MB/s. As you can see there is no magic in those numbers! On the latency side, we had an average latency of 0.05ms with a standard deviation of 3.6ms. You might have noticed there was quite a bit of deviation when you put that in perspective with the average latency. The next piece of information will help us understand this better.

  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |        N/A |      0.002 |      0.002
   25th |        N/A |      0.006 |      0.006
   50th |        N/A |      0.006 |      0.006
   75th |        N/A |      0.007 |      0.007
   90th |        N/A |      0.009 |      0.009
   95th |        N/A |      0.011 |      0.011
   99th |        N/A |      0.023 |      0.023
3-nines |        N/A |      0.083 |      0.083
4-nines |        N/A |    254.177 |    254.177
5-nines |        N/A |    389.594 |    389.594
6-nines |        N/A |    983.266 |    983.266
7-nines |        N/A |    983.268 |    983.268
8-nines |        N/A |    983.268 |    983.268
    max |        N/A |    983.268 |    983.268

This represents the latency histogram captured while running the test. If you are a bit statistically impaired as I am, I suggest you read the following article that will give you a good overview of percentiles and histograms and how they are used within the realm of performance analysis. If you look at the 99.9th percentile (3-nines), you see some IO had a latency of 254ms(!) which is really high considering Microsoft recommends that latency should be between 1-5ms for log and 4-20ms for data.  However, if you put that into perspective, that only accounts for 1147 IOs or 0.1% of all the IO performed (1 147 971 * (1-0.999)), which is most likely very acceptable.

Tweak and retest the storage platform if necessary

While reviewing the results, you might find oddities in the test results. There are a variety of reasons as to why this might happen. The following might be a good place to start while investigating those kind of problems:

  • OS issues
    • Incorrect MPIO policy with Storage Spaces can greatly reduce the throughput of the solution
    • Incorrect file system allocation unit size can cause more data to be read from disk than the size of the IO generated or generate too many IO operations
  • Hardware issues
    • Outdated firmware on the HBA/controller or disk
    • Bad cabling
    • Bad disk. To discover this issue, you can look at the System log in the Windows Event Viewer or you can also use the Get-StorageReliabilityCounter cmdlet in PowerShell. You also might want to take a look at a function I wrote called Get-PhysicalDiskReliabilityCounters that combines information from Get-PhysicalDisk, Get-StorageEnclosure and Get-StorageReliabilityCounter in an easy to consume output. It can also collect the data on remote system if necessary.
    • Flaky disk connections. This happened to us with certain SSDs at specific locations within a large disk enclosure.
    • Overheating HBA/controller resulting in thermal protection to kick in (believe it or not, that happened too!)

Once you have identified and applied a potential fix for your issue, make sure you run your full test suite to validate if the issue is resolved or not. You may have to repeat this several time until you get to the final solution.

Keep the final results of the tests for future reference

I suggest you keep all the test results including the bad ones as you might have to compare a problematic situation in production with a behavior that was encountered while testing. For example, if encounter a high latency issue while testing and for some reason that problem crept back while running in production, it might be useful to view the conditions that caused the problem which ultimately will lead you to the final solution.

Conclusion

As you saw throughout the article, testing a storage solution can be a lengthy process but is absolutely worth the time spent to ensure your application performs smoothly as expected. By breaking down the process in steps that are easy to understand and adapt to your situation, you will be able to devise the right testing solution for your needs. Should you have any questions or comments regarding this, feel free to comment on the post! In a future post, I will explain how to automate certain tasks in the storage testing process using PowerShell to simplify and accelerate the exercise. Stay tuned!

 

Altaro Hyper-V Backup
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

9 thoughts on "Storage Performance Baseline with Diskspd"

  • Quinn Gudex says:

    Mathieu,

    Thank you for the detailed example for SQL Server. I appreciate observing the thought process involved in preparing the test parameters. However, I did have a couple questions: 1) how did you determine the number of threads used during the test? I’ve seen others recommend setting this value equal to the number of CPU cores available to SQL, but you chose a lower number. 2) How did you determine the IO queue length?

    Quinn

    • Hi Quinn,

      To answer your first question, the general recommendation is the right one, test with the number of cores allocated to SQL or your other workload. I chose a lower core count while running diskspd in an attempt to artificially highlight a bit better the CPU usage on the cores involved in the tests. However, at that IO level, it’s not super significant but on larger system doing small IO, CPU can become a bottleneck.

      In regards to the queue length, it’s usually good to test at a queue length of 1 as that typically represents a worst case scenario. At higher queue depth, the OS/disk can perform optimizations (i.e. opportunistically reorder IO) that yields better performance. So you can take the IO queue length of one’s result and expect at least that in real life. I personally usually run tests at a queue depth of 1 and at something high like 8 or 32 outstanding IO to give me the range of the particular array capabilities that I’m testing. If you’re on a tight timeline and can’t afford testing several permutations, I’d start with queue depth of 1.

      Let me know if that answers your questions!

      Mathieu

  • Quinn Gudex says:

    Mathieu,

    Thank you for the detailed example for SQL Server. I appreciate observing the thought process involved in preparing the test parameters. However, I did have a couple questions: 1) how did you determine the number of threads used during the test? I’ve seen others recommend setting this value equal to the number of CPU cores available to SQL, but you chose a lower number. 2) How did you determine the IO queue length?

    Quinn

Leave a comment or ask a question

Your email address will not be published. Required fields are marked *

Your email address will not be published. Required fields are marked *

Notify me of follow-up replies via email

Yes, I would like to receive new blog posts by email

What is the color of grass?

Please note: If you’re not already a member on the Dojo Forums you will create a new account and receive an activation email.