Monthly Archives: November 2013

Never Trust a Schematic Blindly

Last week I posted a long winded video on using an oscilloscope to evaluate the performance of two different transistors. The 2n2222 transistor was behaving oddly at the top end, and the MPS A42 transistor had a rather significant turn-on delay.

After discussing my findings with someone who knows much more about these things than I, I was advised to take another look at the circuit. The A42’s performance did not seem correct. (Turns out before being used as a video driver transistor, it’s 300V capability was used to switch tubes). Below is the circuit I used as a template in my testing. I used an FET I had on hand and initially replaced the 2n3903 with a 2n2222 and later with an MPS A42.

Simple PWM FET driver circuit

On a hunch I decided to replace R1 with a 10k trimmer. Bringing R1 down to about 300 ohms fixed the delay issues for the A42 and for both of the transistors and significantly squared up the corners for both.

Moral of the story… don’t blindly trust someone else’s schematic to work for you. Test. Measure. Evaluate. Fix and re-test. For my needs, replacing Q2 with a 2n2222 and R1 with a 330 Ohm resistor generates the perfect PWM driver. YMMV.

Nice Square Waves

Using an Oscilloscope to Select a Transistor for a PWM Project

Recently saw some discussion online (I think hackaday.com) regarding the merits of an oscilloscope in the modern hacker’s lab.
As I was working on an DC motor control driver using Pulse Width Modulation (PWM), I was getting odd behavior from the circuit when approaching full power output. In the attached video, I discuss the role an oscilloscope played in helping select an appropriate transistor for the PWM circuit. Specifically, I show how two different transistors behave in the same circuit.

A VMware Replication Performance Secret

In datacenters, Disaster Recovery is not just a concept, but it is something that we must take into consideration every day. The company/customer make money if the services we provide are operational. We try to make everything as stable and secure as possible, but we also have to plan for the worst – what if the building is gone tomorrow? For this contingency, we usually identify a location FarFarAway, that is not subject to the same risks as the primary location. EZ. Find a location, sign a contract, install and configure environment and that’s it, right? ummm… nope. Now comes the hard part.

The hard part for DR has always been replicating data from the production site to the recovery site. You have to make sure all of the data gets across to the recovery site in time and consistently. Since we are a rather large VMware shop, and the entire production environment is now VMs, all we have to do is replicate the VMs and their additional data to the recovery site.

We use a number of tools for replication:

  • SAN based replication (Compellent)
  • vRanger
  • VMware’s Site Recovery Manger (in evaluation at the moment)
  • Offsite replication of our daily backups (Quantum DXi)

The SAN replication moves the data that is not encased in a VM, in our case, file server data. vRanger and VMware’s SRM replicate the VMs. vRanger is cheap, is poorly integrated with vCenter, and has a rather weak understanding of Recovery Point Objectives. SRM is awfully expensive, can be managed from the vSphere client and will plan the replication schedule to make sure the replica is never older than you specify.

All of these are great tools, but I had been banging my head against the wall trying to identify why the replication processes are unable to make full use of the available bandwidth. SAN to SAN replication was just fine, but vRanger and SRM were struggling. I tested various TCP performance enhancements, opened cases with VMware, etc. Nothing helped until we installed a shiny new Compellent SAN in the DR location.

By the time the new SAN came in, I was getting quite frustrated with the DR replications. Every night at least 10% of the replication processes would fail. Some required a restart, others were in such a bad state the replica VMs had to be wiped and re-synced. It was bad. Really bad. When the new SAN was installed, the first thing I did was move a few of the troublesome replications to the new SAN. I nearly fell out of my chair when I looked at the bandwidth stats! The replications were running flat out, using the entire capacity of the WAN link. This had never happened before! All of the replicas are now running to the new SAN and the replication failure rate is less than 1%. Yes!

As it turns out, recovery site storage performance is critical in reducing the latencies in the replication process. I was using an old hand-me down Compellent, as well as a Quantum DXi as storage devices. The IOPS performance of the new SAN beats both of these devices by at least an order of magnitude. And it shows in the replication performance.

So, if you are having VM replication performance issues, take a look at your DR site SAN. It might not be good enough.

VMware – VM loses network connectivity after vMotion

The VMware virtualization environment is a very flexible and powerful tool for delivery a variety of computing services. The flexibility this product has brought to the IT world has made many nearly impossible actions simple and error free. One of the more unique features introduced is vMotion – moving a running virtual computer from one physical host to another without downtime. Cool!

All great until it fails.

Just the other day, I saw a situation where a VM was moved from one host to another and it disappeared from the network. Check vCenter – everything is running. Even other VMs on the same networks are working fine. What?

This is a case of ARP cache poisoning. Each computer network interface has a unique address on the network. The network switches keep a database of these addresses and reference these addresses to a specific switch port. If you move a device from one port on a switch to another, the switch notices the network interface has disappeared, it sends out a query asking where is address X.  On one switch, easy. When you daisychain network switches, the each switch keeps a record of all addresses that can be found on neighboring switches. The more switches between the initial location of the VM and its final location, the longer it will take. Typical timeout per switch is 5 mins.

It is this multi-switch configuration that caused the problem we saw. Switches were originally designed for rare changes in the network topology. Virtualization has changed that. When VMware moves a vm from one host to another, in a large environment the machine might get dropped on a switch many hops away from the original switch. Depending on switch configurations, it can take from 5 – 30 minutes before the switches discover where the moved VM has shown up. OUCH! That is an outage!

So how to fix it? Immediately flush the arp cache on all the switches when you first notice the problem. But to fix it long term see the suggestions below:

For a cisco switch environment, disable arp caching and enable ARP Masquerading.

For a Force10 switch, run this command: mac-address-table station-move refresh-arp

Good luck!

Make good with Dell/Compellent

Some months ago, the cloud service provider I work for and Dell/Compellent got into a rather public tiff regarding the performance of their SAN (Storage Area Network) arrays. We had been using their product for a few years with nothing but trouble and more trouble. Nearly all of the customer facing outages we incurred could be directly traced back to issues on the SAN. There was incredible pressure from internal management and customers to dump the thing into the Allegheny River and get something else. (well, not really dump it into the river, but writing off a quarter million dollar storage device before it is depreciated is darned close to dumping it into the river…)

An article showed up in The Register (www.theregister.co.uk) discussing all of the great new features Compellent is working on for their upcoming software upgrade. We had just struggled through another burp on the san for which yet another RCA (Root Cause Analysis) had to be presented to the customer pointing to the same problem… I could not keep my thoughts about the product to myself and laid into them with a scathing review in the article’s comments section (see here). Within days I was seeing high ranking Dell executives investigating my background on LinkedIn.com and elsewhere. Account managers and their managers were on the phone with my boss to get to the bottom of the problems. It got toasty!

It paid off.

Dell assigned us a team of top level specialists to get the issues resolved. The entire SAN configuration was reviewed, software releases were changed (some went up, some went down). Configuration changes were made, and more. The end result… the machine has been (mostly) behaving itself these last few months. The issues we have seen have not been so severe as to affect product availability, and have been properly investigated and escalated. Amazingly, we are also now in agreement as to the root cause of all the troubles we have seen.

So… Dell/Compellent… Thanks for stepping up to the plate and helping resolve the issues we faced. Many thanks to the many people who helped to get us back on the right track. There is still more work to do, but the results are already putting smiles on people’s faces.

Haloween Pumpkin Light Project

Every year we make a pumpkin to decorate the front porch for Halloween. My daughter, the artist in the family, traces the design on the pumpkin, I cut it out, we add a generic pumpkin light and wala! a pumpkin is ready.

This year I thought I would do something different. I had a few strips of red/blue/white 12v LEDs my Dad once gave me (and is impatiently waiting for me to do something with them!), and I decided to make a mood light based red/blue LED driver with the white LEDs flashing to create a sparking effect.

Rather than use a full Arduino for this project, I decided to try my hand at using the smaller format ATtiny84 on a custom board. I had recently bought some for tinkering and this seemed like the perfect first project. Much easier than I thought. Installed the AVRisp software on my Ruggeduino, connected the leads as required and added a few LEDs for testing. Works!

Ruggeduino as AVRisp

Ruggeduino as AVRisp

Why Ruggeduino, you ask? EZ. I have already blown up used up all of my ATmega328 chips on other projects and I have yet to find a way to destroy the Ruggeduino. Wish I could say that about my UNO you see in the background of the pic above… but that is for another story.

With the ATtiny programming issues sorted out, time to figure out how to drive the 12v LED strips from the ATtiny. Despite using high gain A63 Darlington NPN transistors, I was not able to get the LEDs to full brightness with a single transistor. I found that if I use a 2n2222 to pull the base to ground, I could get the desired light output.

Prototyping the Pumpkin LED driver

Prototyping the Pumpkin LED driver

The software is a mashup of a moodlight sketch and a firelight sketch I had laying around. I added some potentiometer control to limit the brightness of the flashing white light. This would make the board more universal for other seasons. Once I was satisfied the design works, I put it all together on a piece of Radio Shack protoboard I had laying about. I wrapped short LED strips around a cardboard tube and dangled it from the top of the pumpkin.

LED controller board ready for use

LED controller board ready for use

2013-11-02_18-40-01_6522013-11-02_18-38-56_372

A more visual representation of the project can be seen in a short video I uploaded to YouTube earlier today.

Oh… do you think any of the kids noticed the cool pumpkin by the door? sigh… damn candy…

2013-11-02_18-38-33_346

Happy Halloween!