VMware 10G NIC Performance Evaluation

For the last few years we have been running IBM x3850 x5 servers in our cloud, with Emulex 10G NICs for all of our networking needs. One of our services was starting to run out of steam, and we picked up a pair of HP DL580 G8 servers for the needed horsepower. These servers were configured with Intel based 10G NICs. At the time of the purchase, our main concern was to get the maximum CPU performance and minimal attention was paid to 10G NIC selection.

As we put these servers into production, we noticed that the VMware iSCSI performance on the new HP hosts seemed especially good. The E7-8891v2 Xeons undoubtedly have a large role to play in the improved performance, but we began to wonder how much of the performance improvement can be attributed to the 10G NICs. Not one to let such thoughts sit idle for long, I asked our friends at Zones if we can borrow a few 10G NICs for performance evaluation. Within a few weeks I had dual port Intel and Mellanox 10G NICs in hand for testing, in addition to the Emulex NICs already present in our IBM hosts.

Before I get too deep into the testing and results, a few words on the environment. The environment is a VMware 5.5 based cloud delivering hosted applications to our customers. Backing that cloud is an iSCSI SAN and all networking is 10G, running on Force10 hardware. We use the VMware SW iSCSI initiator for storage connections and the VMware Distributed vSwitch for VM networking. In addition to the iSCSI traffic, the VM networking is divided between HTML, CIFS and database connections. The environment is configured to use a 9k MTU for all traffic.

In order to evaluate the NIC performance, two tests were created to evaluate the two usage models. One of our IBM x3850 X5 servers was removed from service and the two Emulex cards were supplemented with the Intel and Melanox based cards. To test the iSCSI performance, a Windows Server 2008 R2 VM was created with two disks, C: and E:. A typical dataset from one of our most popular applications was selected (aprox 10GB, 18,000 files). A script was written to copy the selected files from drive c: to drive e:, then from drive e: to a new location on drive c: and then finally a cleanup of all newly created folders and files. Xcopy was used to copy the data. The script recorded the start time, completion time and total runtime. Total run time was used as the performance measure. CPU utilization was initially considered, however during initial testing, the CPU utilization was indistinguishable from VM workload. For the iSCSI test, the VMware iSCSI stack was configured with two iSCSI interfaces on a single standard vSwitch. During the course of testing, the iSCSI vmkernel ports were mapped to the different vendor NICs. A 9k MTU and round robin load balancing were enabled. The ideal NIC would complete the test in the fastest time.

To test VM networking, the same dataset and script were used, however the data was copied from the VM’s c: drive to a file server, then copied back to c: and finally cleaned up locally and remotely. For this test, overall time to completion and average CPU utilization as observed by vCenter were used as a measure. Average CPU utilization was determined by exporting the VM’s real time CPU performance data as a csv. The data was imported to MS Excel and the average CPU utilization was calculated by averaging all entries from the start until the completion of the run. The test was repeated three times once each for the e1000 and vmxnet3 VM nic drivers, and each was also tested with a 1500 MTU and 9000 MTU. The ideal NIC would complete the test in the fastest time with the lowest CPU utilization.

Test hardware

  • IBM x3850 X5 (7143) 2x E7-4870 processors, 512 GB RAM
  • Emulex OneConnect 10Gb OCE11102-NX (IBM 49y4250)
  • Intel x520 (IBM 49Y7960)
  • Mellanox ConnectX EN 10GigE MT26448 (IBM 81Y9990)

iSCSI Test results

Five separate iSCSI tests were performed. The first test was with both virtual disks on local SSD storage (ssd-ssd). This is a benchmark to show optimal performance and does not involve the networking or iSCSI stack. The average completion time for this benchmark is 4:22 (4 mins, 22 secs).

The second test had one virtual disk on local storage and the second on iSCSI SAN storage (ssd-P2). The third test had both virtual disks on iSCSI SAN storage (p2-p2). Completely by accident, I determined there is a significant impact to storage performance when the VM has a snapshot. The fourth test quantified the impact of one snapshot against the VM (p2-p2-snap). The final test, performed only against the Mellanox NIC added a second snapshot to the VM (p2-p2-2xSnap).

In the chart below, the Mellanox clearly outperforms both the Intel and the Emulex cards. The Mellanox completes the test 21.8% faster than the Emulex and 26.7% faster than the Intel NICs in the p2-p2 test.

iSCSI Performance

VM Networking Test Results

For each of these tests, the Distributed vSwitch NIC mapping was changed to the appropriate vendor NIC. The VM’s virtual disks were parked on the server’s local SSD storage. We still have many VMs configured with the e1000 virtual nic driver, but all new VMs are being built with the vmxnet3 driver, so I tested both. All production VMs are configured for a 9000 MTU, but I also tested the standard 1500 MTU for comparison.

In the chart below, the vertical axis shows average CPU utilization during the test as recorded by vCenter for the VM. The horizontal axis shows the total runtime for the test. I have linked the datapoints for each NIC starting with the e1000 1500MTU, vmxnet3 1500MTU, e1000 1500MTU and finally at the bottom, vmxnet3 9000MTU. Best overall performance is closest to the origin (Mellanox vmxnet3, 9000MTU).VM Networking Performance

In each of the tests, the Mellanox outperformed the competitors in time to completion, however the Emulex showed a lower average CPU utilization when using the 1500 MTU. Oddly, the Intel NIC performance was not even close, and showed minimal improvement when using vmxnet3 or the 9000 MTU.

Conclusions and Next Steps

In our environment and with our workload, the Mellanox 10g NIC clearly outperforms the Intel and Emulex competition on our iSCSI workloads. For VM networking, it is clear that the Intel nic is not the right one for the job on the IBM hosts. Oddly, this is the NIC running on our new HP hosts, although the firmware on each is IBM and HP branded/optimized.

Based on these results, all of our future 10G NICs will be Mellanox, and as budget allows, we consider replacing the existing Emulex NICs.

On our HP servers, we are considering a mini bake-off to compare the HP branded Intel NICs with an HP branded Mellanox. If we find the same performance advantage there, we will also prepare to replace those.

Final Note

Some testing was also done with large file copies and iometer. When copying a large 10G data file between the files server and the test VM, all three cards showed similar CPU utilization and sustained performance, thought the Mellanox did have lower CPU. Most likely something in the file server was maxing at 350 MB/s, as it was possible to sustain this transfer rate whether writing to local SSD or the iSCSI SAN. With iometer, all cards were able to demonstrate more than 500 MB/s transfer rates for iSCSI. However, since neither of these tests are representative of our production workload, I did not consider them in this evaluation and did not formalize my results.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s