Overload again and again and again! - EDN

2022-06-18 20:34:27 By : Mr. inati wu

In the late 1990s I worked for a company that manufactured subsystems for rail transit cars: door operators, heating ventilating and air conditioning (HVAC) and other products. I was hired to be a “fireman”—not the real type, but to take care of technical problems that plagued any project. 

About a month after my hiring, my boss came into my cubicle and told me he had a problem.  One of our products—an HVAC system—was tripping the Auxiliary Power Unit (APU) and that I should investigate. He also tells me the design is from a company that we acquired and none of the people who designed it are working with us.  I understand this means I am on my own.

Do you have a memorable experience solving an engineering problem at work or in your spare time? Tell us your Tale.

Let me give you some background information on subway cars and tramways.  Most cars are DC powered.  Power comes in, at 600 V to 750 V depending on the system, and is sent to the traction system and to the APU.  The APU converts the DC to three-phase AC, so it is an inverter.  In our case the 60 kVA APU was powering only one thing, the HVAC unit.

My boss told me that I had to be at our HVAC factory, in upstate New York, by the next morning to witness the final test on a unit that was being delivered. 

Next morning I’m on the factory floor looking at the HVAC unit.  It is a top mounted unit; it will be installed on the tramway’s roof.  It is about 3.5 meters long and 2.5 meters wide.  A separate control box is installed in the car ceiling for easy access.  The technician plugged the control box into a low voltage DC power supply, then connected the high voltage to the factory 208 V three-phase 60 Hz, turned on the DC supply, switched on the AC circuit breaker, and ran through the final test.  I see no sparks, smoke, or other signs of overload or stress or loud humming.  I talk with the technician and he tells me he has not seen any indication that the units manufactured to date cause overloads during the tests.

The next day I reported my findings to both my boss and the project manager. My next action is to go to the transit authority facility and investigate.

A couple of days later I am in a tramway car at the transit authority main repair shop.  It is a new transit system: new cars, new tracks, and new shop.  I meet with the transit authority technicians, the APU technician, and the car builder representative.  They all confirm that the HVAC is repeatedly overloading the APU.  The APU is designed specifically for the HVAC load for these cars.  The transit authority requirements for the APU, the HVAC and all other systems are all written clearly in the request for bid documents.  The APU technician informs me that the APU monitors its output current and after three overloads it will shut down permanently.  Afterwards, the technician has to insert a USB thumb drive with special software to reset the APU.  To demonstrate, the technician powers the car on and then off, and the third time he powers it up, the APU trips.  We repeat the test a few times with the same result. 

We came to the conclusion that to determine if the APU or the HVAC is the source of these overloads, we will have to take measurements in a car.  This means having access to a lot of test equipment, a test car, personnel and other resources.  I tell them I will go back and discuss this with my boss and the project manager to coordinate with all involved.

The next day my boss and the project manager both agree that measurements in a car are the way to go.  The project manager informs me that the HVAC control box I requested is in my cubicle.  I tell them that I will check it out. I hope to find a simpler solution than one that requires days of expensive on-site tests. 

The HVAC control box is the size of a tower type PC. It has two connectors.  The CPU reads the temperature then sends commands to the contactors and that’s it.  The CPU writes the command words into two 8-bit latches; the latches control transistor buffers, the buffers control low power relays. Finally the relays send power to the contactors coils mounted in the HVAC unit itself.  This way the high voltage is present only on the top mounted HVAC unit and not in the control box.

The turn-on reset circuit is complex.  The CPU has its own reset circuit and the two output latches each have a separate RC network connected to the output enable pin.  I connected oscilloscope probes to the latch output enable pin and to the write pin.  The CPU writes data into the latch after the output enable is released!  At power on, the latches are in an undefined state and the output could turn all the relays on.  This means the fan motor, the compressor motor and all the heating elements are on.  This amounts to twice the rated APU power.  So this is likely the origin of the overload. It also explains why this was never seen during the final systems test at our factory.  The test technician always applies DC for a few second before he can reach the AC switch. 

I measured the time the CPU takes to write into both latches.  I computed a new worst case value for the resistor in the output enable RC network.  I changed the circuit from two separate RC networks to a single RC network with a longer time constant driving the two output enable pins.  I also added a diode in the place left free by removing one of the resistors.  This will give a faster discharge to the capacitor if power is lost momentarily.  In case of a short duration power loss, the latch will lose its content but the capacitor may not have time to discharge sufficiently.  The diode takes care of that case.

The modified unit worked fine and the delay between the write pulse and the output enable is conservative.  I informed my boss and the project manager of my findings and described the simple modifications: remove three parts, add one resistor, a diode, and one wire.  We decide that I will modify units to equip one train and send them to be tested by the transit company.

A couple of weeks later we got feedback that the modified units have cured the problem.  I sign off on the ECO to have all the units retrofitted: case closed.

The take away is: always check the overall system reset timing with the final software before releasing the product; if possible test with the final system power supply, test different power on sequences and be ready to make modifications during system integration.  

Daniel Dufresne is a retired engineer and has worked in telecommunications, mass transit, consumer products, and high power electronics and custom instrumentation design. He also was a professor at Cegep de Saint-Laurent and taught courses at Ecole Polytechnique de Montreal.  Daniel published articles in Audio, EDN, Electronic Design and other publications. He lives in Montreal, Canada and still works on electronic projects and test equipment modifications and repairs.

Using RC (or other simple timer) to enable a system is STUPID, STUPID, STUPID! This is how old microprocessor RESET lines were handled, & led to PROBLEMS if the Vcc “came up” too slowly. Some manufacturers (such as Dallas “universal timer” chip) used this circuit INTERNALLY with fempto-farad timing capacitor on the RC. If the Vcc had even the smallest bypass cap on it, the chip would fail to initialize. Vcc had to rise in MICROseconds! I was going to use one such chip in a critical military system. I ditched it in favor of 4000 CMOS ICs. The PROPER way to do this is to have a circuit check the system for “validity to proceed” (for microprocessor, verify system voltage is above minimum spec’d level), THEN fire the RC. For this circuit, the CPU should write an ENABLE line AFTER the 2 latches are written with valid data. You can route THAT to the RC timer. The ports should be resistor’d to INACTIVE state. (Hopefully, the ports “come up” in floating state on power-up of the CPU. If this had been true, the problem would have not occurred. Putting in a big delay may fix the problem now, but will lead to trouble later. A system should NEVER rely upon the relationships between 2 INDEPENDENT timings to work properly! What will happen if the CPU clock takes longer to start up or doesn’t start up at all? The “problem” should be limited to a “dead” CPU ONLY, not a latched-off power supply. The best way to eliminate this problem entirely is to use the newer “variable-frequency” drivers for the compressors & fans. These ramp the power on slowly so matter even if all are started at once, there is NO surge. It simply won’t matter! A challenge with the newer parts is assuring that EVERYTHING THAT MATTERS is inactive until the processor is in control. That is an ANALOG problem because we are specifying performance BEFORE voltages reach “spec’d” levels. That problem extends into the circuitry of the controlled device also. A compressor relay driver that glitches ON during powerup will result in the supply shutdown, just like mistiming of control signal from the CPU.

Hello Mr Park, I agree with you about the use of a dedicated power-on-reset chip. Power on reset circuit is not to be left to novices. I do prefer ICs that also monitor the supply line and issues a reset pulse only after the supply voltage is acceptable and stable. The timing should allow ample time for the clock circuit to stabilise, provided the CPU or MCU manufacturer datasheets covers that information. The use of a watchdog in microprocessor and microcontroller applications is also, in my opinion, a must. In early 1980 I was part of a team that designed a digital cable converter box. We implemented a watchdog timer that ss enabled by the CPU and that cannot be stopped. The only way to disable it is by a power on reset. To continue running, the CPU has to write a specific word into the appropriate location. It is allowed one miss, if the CPU does not respond, at the end of the next time slot there is a system reset. I also agree that the processor should actively command the release of the latched data to the output. Do not forget that this HVAC control box was from another company and came with little documentation, that some units were already in the field and that the production was still running, this limits the available solutions. Total production run is less than one hundred units with the electronics being second in importance to the HVAC proper. The use of variable voltage variable frequency, VVVF, inverters adds cost and weight to tramway cars and is not always the best solution. Commercial off-the-shelf motor drives are not designed for rail transit operation. Transit authorities limit car weight in the call for bid. More weight means more power needed for traction, more problems in balancing the load on the wheels, stronger car bodies and higher operating costs. On other projects, we used VVVF inverters because we designed the inverter, the load was a single motor and the on-off control signal was sent to the inverter. The above case history deals only with the technical aspects. In the company I worked for, the project manager had complete authority and responsibility for that project. Before selecting the solution to be implemented, I presented him with various solutions. Solution one, a complete redesign of the controller board to eliminate reset issues and absence of watchdog with added engineering costs and delays, production costs and delays, the need to redo the qualification tests with added costs and delays, units already in the field would be swapped with new control boxes, old control boxes returned to factory for board swapping, old boards recycled for metals, and transit authority late delivery fees added on to the total cost. The manager’s answer: No, too costly and too much delays. Solution two, design of an add-on reset board aimed at reset issues only with lower engineering costs and delays than solution one above, lower production costs and delays related only to the add-on board, no qualification tests costs or delays, units already in the field swapped with new control boxes, old control box returned to factory for modification, small volume of recycled or waste materials, no transit authority late delivery fees. The manager’s answer: No, too costly and too much delays. Solution three, use the new resistor value with added diode and wire with minimum additional engineering costs and delays than the two other solutions, no board production costs or delays, no qualification tests costs or delays, new units modified during production, units already in the field swapped with new control boxes, old control box returned to factory for modification, small amount of recycled or waste materials, minimum delay in having problem free units installed, no transit authority late delivery fees. The manager’s answer: Yes to the solution with lowest cost and shortest delays. Thanks for posting your comments.

The above case history manages the specialized angles. In the organization I worked for, the undertaking supervisor had total power and obligation regarding that task. Before choosing the answer to be executed, I gave him different arrangements. Arrangement one, a total update of the regulator board to kill reset issues and nonappearance of guard dog with added designing expenses and postponements, creation expenses and deferrals, the need to re-try the capability tests with added expenses and deferrals, units currently in the field would be traded with new control boxes, old control boxes got back to the processing plant for board trading, old sheets reused for metals, and travel authority late conveyance charges added on to the all-out cost.

Hi TiyaAlford, your comment describes a situation similar to mine.

The boss is the boss whatever is title is.

You must Sign in or Register to post a comment.