Tag Archives: HP

VMware 10G NIC Performance Evaluation

For the last few years we have been running IBM x3850 x5 servers in our cloud, with Emulex 10G NICs for all of our networking needs. One of our services was starting to run out of steam, and we picked up a pair of HP DL580 G8 servers for the needed horsepower. These servers were configured with Intel based 10G NICs. At the time of the purchase, our main concern was to get the maximum CPU performance and minimal attention was paid to 10G NIC selection.

As we put these servers into production, we noticed that the VMware iSCSI performance on the new HP hosts seemed especially good. The E7-8891v2 Xeons undoubtedly have a large role to play in the improved performance, but we began to wonder how much of the performance improvement can be attributed to the 10G NICs. Not one to let such thoughts sit idle for long, I asked our friends at Zones if we can borrow a few 10G NICs for performance evaluation. Within a few weeks I had dual port Intel and Mellanox 10G NICs in hand for testing, in addition to the Emulex NICs already present in our IBM hosts. Continue reading

Advertisements

Patch, Patch and Patch Some More

As some of you know, I look after a datacenter dedicated to providing a specialized cloud service for the printing industry. We use similar servers and a similar infrastructure as another datacenter I know. This includes IBM x3690 and x3850 enterprise grade servers, 10G networking, VMware virtualization, etc. These servers do not have the fastest processors, but they have awesome on-line out-of-band diagnostics, can handle terabytes of RAM and will have   replacement parts delivered PDQ. Ours have been running for the past two years without failure and data loss.

In my talks with the manager of the other datacenter, I have been hearing complaints about these servers. Frequent crashes, data loss, disk failures and more. Huh? They have the same hardware, and are running a newer VMware release (5.1) than we are (5.0). Digging deeper, I began to see a pattern. Out of date patches. I usually run no more than 4 months behind on host patches, but looking at their systems, most of them are the original release level. But it goes deeper than that.

In a virtual datacenter, there are five levels of patches:

  • Application
  • Operating System
  • Hypervisor (VMware, Hyper-V, Xen, etc.)
  • Firmware
  • BIOS

Most users only see the top two levels through application updates and Windows service packs etc. However when you run a virtual datacenter, the bottom three levels become very important. A VMware host may run up to 100 virtual machines. An outage on this host affects all of the machines it hosts. Making sure the host is operational becomes Very Important.

VMware and Microsoft both provide simple mechanisms to keep the virtualization environment up to date. Microsoft does this via the Windows Update function. VMware provides a free add-on and plugin called Update Manager. Both of these tools are very easy to use and only require a brief maintenance window, when to reboot the host and update the hypervisor.

When you buy a host and its accessories from HP or IBM (I assume also from Dell..), you can use the update tools they provide to bring the firmware for all the accessories up to date with a single reboot. IBM provides UpdateXpress and HP provides a new HP Service Pack for Proliant utility. Both of these tools can be configured to bring the host system up to date with the latest fixes for all of the RAID controllers, disks, network interfaces, BIOS, remote management, etc.

All of the updates provided by HP and IBM come with release notes. Think back to the last time your server crashed. If you think it was storage related, find out the firmware level of  your RAID controller and find the latest firmware update for the controller. Read through the release notes – you should not be surprised to see the issue you ran into has already been patched… but you are not running it on your server. Time Bomb? You bet! No, really, you are betting with your paycheck.

So.. Patch! Every three to four months patch completely. Find the tools, learn them and use them. It is better to take a controlled, scheduled outage than one right in the middle of the day with data loss.

Good luck! Don’t let that unscheduled outage kick you in the a$$!