Tag Archives: Fail

Patch, Patch and Patch Some More

As some of you know, I look after a datacenter dedicated to providing a specialized cloud service for the printing industry. We use similar servers and a similar infrastructure as another datacenter I know. This includes IBM x3690 and x3850 enterprise grade servers, 10G networking, VMware virtualization, etc. These servers do not have the fastest processors, but they have awesome on-line out-of-band diagnostics, can handle terabytes of RAM and will have   replacement parts delivered PDQ. Ours have been running for the past two years without failure and data loss.

In my talks with the manager of the other datacenter, I have been hearing complaints about these servers. Frequent crashes, data loss, disk failures and more. Huh? They have the same hardware, and are running a newer VMware release (5.1) than we are (5.0). Digging deeper, I began to see a pattern. Out of date patches. I usually run no more than 4 months behind on host patches, but looking at their systems, most of them are the original release level. But it goes deeper than that.

In a virtual datacenter, there are five levels of patches:

  • Application
  • Operating System
  • Hypervisor (VMware, Hyper-V, Xen, etc.)
  • Firmware
  • BIOS

Most users only see the top two levels through application updates and Windows service packs etc. However when you run a virtual datacenter, the bottom three levels become very important. A VMware host may run up to 100 virtual machines. An outage on this host affects all of the machines it hosts. Making sure the host is operational becomes Very Important.

VMware and Microsoft both provide simple mechanisms to keep the virtualization environment up to date. Microsoft does this via the Windows Update function. VMware provides a free add-on and plugin called Update Manager. Both of these tools are very easy to use and only require a brief maintenance window, when to reboot the host and update the hypervisor.

When you buy a host and its accessories from HP or IBM (I assume also from Dell..), you can use the update tools they provide to bring the firmware for all the accessories up to date with a single reboot. IBM provides UpdateXpress and HP provides a new HP Service Pack for Proliant utility. Both of these tools can be configured to bring the host system up to date with the latest fixes for all of the RAID controllers, disks, network interfaces, BIOS, remote management, etc.

All of the updates provided by HP and IBM come with release notes. Think back to the last time your server crashed. If you think it was storage related, find out the firmware level of  your RAID controller and find the latest firmware update for the controller. Read through the release notes – you should not be surprised to see the issue you ran into has already been patched… but you are not running it on your server. Time Bomb? You bet! No, really, you are betting with your paycheck.

So.. Patch! Every three to four months patch completely. Find the tools, learn them and use them. It is better to take a controlled, scheduled outage than one right in the middle of the day with data loss.

Good luck! Don’t let that unscheduled outage kick you in the a$$!


Dual Speed Swimming Pool Pump Controller

A couple of years ago, I installed a dual speed motor for the swimming pool pump. It works great and rather than burning 9 amps @ 220V to circulate the pool water, it can run half speed at 2 amps @ 220v. As you can imagine, this saves quite a bit of money and is also much quieter.

Unfortunately, there is a downside. If the pool heater starts heating the water when the pump is running at low speed, the water begins to boil in the heater – not good. The boiling sounds are quite loud, and when I hear them, I can run outside and flip the switch to run the pump at high speed. This is not ideal. I am not always at home or awake… sadly.

To solve the problem, I have been working on a motor control circuit, to measure the water temperature as it enters the pool heater, and as it leaves the pool heater. If the output temperature is 1C higher than the input temperature, the controller activates the high speed winding on the pump motor. When the temperature differential has dropped below 1c, the controller waits about 15 more minutes and then returns the pump speed to low speed (eco).

PoolControllerPrototypeThe controller has two switches, one to force the motor to run at high speed (when you need to clean the pool), and the other to put the system in maintenance mode (stop motor to clean filter, etc.), and a 2×20 LCD display to show the current status. The pic below is an early prototype without the display header.

The brains of the controller is an ATmega328 microcontroller with and Arduino bootloader. This monitors the waterproof temperature probes and switch settings. An MC14555 1of4 switch is used to activate the relays. 2013-05-11_22-51-15_464Theoretically this part is redundant, but since I trust programmers about as far as I can throw them, (I can’t throw myself very far either…) I added this switch to make sure that it is never possible to have both motor windings activated at the same time. The chip will only activate one output pin at a time.

Picture 11I had initially planned for the switching to be done by a pair of Solid State Relays. Benchtop testing successfully powered an incandescent bulb and aff. The testing was done using 110V (!). The controller was mounted in a spare outdoor electrical box left over from the hot tub I used to have. For power I dug out an old Nokia power supply and connected it to one side of the 220V motor circuit and ground(!).

Picture 10 I wired up the controller and the motor, installed the temperature probes for the heater, double checked all my wiring and when satisfied, flipped the master switch. A short blink on the LCD display and then it went dark. After considerable effort rechecking the connections, I went to the basement to check the circuit breaker for the pool. The breaker is a Ground Fault Interrupt (GFI) breaker. If it senses power leaking into ground, it trips. Oops. The power supply is connected to ground, not neutral.

Having resolved the GFI issue, I powered up the system once again. LCD lights up, displays the startup diagnostics, seems happy and reports the motor is now running in eco mode. Exactly as planned… except the motor is quiet! Flip the switch to go to high speed, the display updates correctly, but the motor is quiet. WTF?

I think I spent over a week researching this one online and even resorting to discussing the issue with the guys in a local electrical supply. By all indications everything should work. I ended up bringing a 220V circuit into my lab and started testing. It took a few days of testing and a few blown light bulbs before I found the answer. Unlike European 220V, which is on a single wire, North American 220V requires two live 110V wires and the load is connected between them. Nearly all Solid State Relays (SSR) use a zero crossing circuit to detect when the phase crosses zero and initiates the open or close action when the phase crosses zero and transfer voltages are at their minimum. On a North American 220V circuit, this does not work. The zero crossing detector cannot find the crossing point and never activates the relay. Back to the drawing board.

If SSR is out of the question, then relays are the way to go. I have a 1 horsepower electrical motor, so went online and found a good deal on a bunch of 2HP 250V relays. Perfect.Photo on 1-22-14 at 10.47 PM

Once the relays arrived, I installed everything on a piece of stripboard. Since the outputs from the mc14555 are not powerful enough to drive the relays, I populated the board with two basic 2n2222 based relay drivers and a header for power and inputs for the low and high relays. I replaced the SSR relays with this board and amazingly, everything worked as planned. YAY!

After a few weeks of operation, I started to notice the pump motor making odd sounds. Switching from low speed to high was not smooth, the motor would stutter and sometimes not engage. Other times the motor would start to stutter when it has been running at high speed for a while. More research. Seems these relays are technically rated for 2hp motors, but are not intended for extended duration workload. Photo on 1-23-14 at 10.37 AMA bit of searching in eBay got me a few of the monstrous relays to the right. Two inches per side, but this one will probably keep running long after the pool has been filled in!

For the high speed windings, the I installed the new relay, and kept the smaller relay for the low speed winding. The new relay needed a new relay control circuit. During testing, I noticed an odd side effect. When applying power to the control board, it would momentarily trigger the relays, activating both and sending a momentary surge to both the low speed and high speed windings. Not good. I am using 2n2222 NPN transistors to switch the relays, with the relays being between VCC and the transistor. relaydriverIf I placed the relay between the transistor and ground, it would avoid the surge, but the relay coils would not always trigger, or trigger slowly, causing extended arcing (sparks). Picture 34Eventually, I found the simplest solution is to make a resistor/capacitor based delay circuit. When power is applied, a resistor slowly charges a capacitor and when the voltage exceeds the base threshold, the transistor opens the flow of electrons to the rest of the circuit. Simple, cheap and it gives me a nice 10s delay on power-on. The end result can be seen on the right, built on a small Radio Shack prototyping board.

With the relays now working properly, I ran into another problem. When the high speed winding is activated, the board would occasionally reset. Initially I thought this might be related to the small 500mA power supply I was using, as the current draw when activating the new relay is quite significant. I replaced it with a larger 1A power supply and the reset problems went away – sort of. The LCD display controller still tends to lose its mind when the high speed winding is started. After a few months of pondering, I have come to the conclusion that the problem may be a drop in source voltage on the 220V side when the motor starts. Adding a large 2000 mfd capacitor to the DC power bus will probably solve the problem.

It is winter and -10C (14F) outside today. Still a few months to go until the weather is warm enough to re-open the pool. There are still quite a few things to do with the controller. Aside from the power issue which still needs to be resolved, there are quite a few bugs in the code. In the latest iteration of the code, for some reason it is forgetting to read the thermometers… I would also like to upgrade the board from its protoboard build to a custom designed board. All of the relay changes have also made a mess of the power cable routings in the box. The ribbon cable to the display has failed a couple of times, so that needs to be improved. There is also a problem with the serial interface. I have about a 30% success rate when attempting to reprogram the board with my FTDI adapter.