Disaster by Design/Safety by Intent #19
Disaster by Design
IPTE in the nuclear industry stands for Infrequently Performed Tests or Evolutions. It describes the measures applied before undertaking planned activities that are not routinely conducted. When tasks are performed on a daily or weekly basis, workers develop proficiencies that are maintained by habit. But when workers have not performed a task in quite a while, it is possible that their awareness of the proper steps has diminished, or that the proper steps have been revised.
If familiarity breeds contempt, lack of familiarity breeds confusion. And confusion at the controls of a nuclear power plant is seldom helpful.
IPTE covers an array of measures undertaken to prepare workers for upcoming activities. Workers may go to a control room simulator to rehearse the steps to be taken during the activities. They might sit through classroom training where the applicable procedures to be used are reviewed. And it is common for supervisors to meet with workers right before activities are initiated in pre-job briefings to ensure everyone knows their assigned roles and responsibilities.
One cannot tally the number of problems and miscues prevented by the IPTE measures. But one can see when the measures fail and IPTE becomes Improperly Performed Tests or Evolutions. The following cases illustrate when good intentions went awry, transforming IPTE into IPTE.
Millstone Unit 2
As detailed in Fission Stories #58, workers reduced the power level of the pressurized water reactor on February 11, 2011, to test valves on the main turbine. This test had not been conducted by this group of workers for many months. So, they went into the control room simulator and stepped through the procedure several times. In addition, a second worker was assigned to every worker with a hands-on task during the test to act as a peer checker. The checkers had the test procedure in their hands and were supposed to make sure that workers manipulated the right equipment at the right time exactly as specified in the procedure.
All those preparations went for naught. When a worker flipped the very first switch in the test, another worker reacted by flipping three other switches. The worker had the right three switches specified in the procedure, but flipped them all in the wrong direction. Additional mistakes compounded that first one and ultimately resulted in the power level increasing uncontrollably by nearly 8%.
As detailed in Fission Stories #59, workers were starting up the boiling water reactor at the Pilgrim nuclear plant in May 2011 following a refueling outage. They withdrew control rods from the reactor to achieve a nuclear chain reaction and then continued withdrawing control rods to increase the power level. Concerned that the reactor water temperature was heating up too quickly, the workers re-inserted five control rods. The rate at which the reactor water heats up and cools down is limited to 100°F per hour. This limit protects the metal reactor vessel from excessive stress (i.e., breaking apart) caused by expansion during heat up and contraction during cool down.
When the water temperature had leveled off, the workers re-withdrew the five control rods and also withdrew another one. By doing so, they lost control of the nuclear chain reaction. The reactor power level began doubling every 20 seconds. Safety systems kicked in to automatically insert all the control rods within seconds to abort the start up and protect the nuclear fuel.
On October 21, 2003, control room operators were reducing the power level of the pressurized water reactor at the Callaway nuclear plant in Missouri because a safety component had broken. The reactor’s operating license only allowed the reactor to continue operating for a few hours with this component broken; otherwise, the reactor had to be shut down. While maintenance workers attempted to repair the component, the control room operators reduced the power level in case the component could not be fixed in time.
The repair efforts failed. But before operators could shut down the reactor, it shut down by itself. Workers had inserted control rods into the reactor core to reduce the power level to less than 5% when two things conspired to stop the nuclear chain reaction. After workers tripped the main turbine and realigned its drain paths, the reactor water temperature rose by nearly 10°F. The reactor core is designed such that as water temperature increases, the reactor power level drops. In addition, the decreasing reactor power level during the controlled shut down allowed xenon, a neutron poison, to build up in the nuclear fuel rods. The increasing xenon inventory further reduced the reactor power level. These two factors combined to interrupt the nuclear chain reaction and shut down the reactor.
But the operators stopped inserting control rods. For 107 minutes, the reactor was shut down due to water temperature and xenon alone. Had the water temperature dropped or the xenon inventory decreased (the inventory increases and decreases over time), the reactor could have restarted. Nearly two hours after the reactor shut itself down, the workers fully inserted the control rods to ensure the reactor could not restart itself. For much of that period, the instrumentation needed to monitor the power level was not turned on.
See pages 93-154 of this document for more details about this event.
On March 17, 2003, workers were shutting down the boiling water reactor at the Hope Creek Generating Station in New Jersey. The workers had restarted the reactor from a maintenance outage, but encountered an equipment problem when the power level reached about 15 percent. The reactor had to be shut down for the equipment problem to be corrected. The standard procedure used to shut down the reactor was to depress two pushbuttons that caused the control rods to rapidly insert into the core, interrupting the nuclear chain reaction. In this case, the workers used the non-routine procedure and began slowly inserting the control rods one-by-one until the nuclear chain reaction was shut down. They opted for the path less traveled out of concern that the rapid shut down of the reactor might cause the reactor water to cool down faster than the 100°F per hour safety limit.
The slow, controlled shut down was progressing smoothly until the power level had been reduced to less than 7%. Another turbine bypass valve suddenly and unexpectedly opened. Hope Creek had multiple bypass valves (labeled BPV in the graphic) that allowed steam produced in the reactor vessel to flow into the condenser without having to travel through the high pressure (HP) and low pressure (LP) turbines to get there. At low power levels, too little steam is being produced to properly spin the turbines, so the bypass valves provide a pathway for the steam. The extra bypass valve being open let more steam flow from the reactor vessel, which in turn caused the water level inside the reactor vessel to drop by about 8 inches. By design, the condensate and feedwater pumps automatically responded to the reactor vessel level decrease by ramping up the volume of makeup flow they were supplying.
The good news is that the pumps quickly restored the water level to its desired setpoint. The bad news is that they did it with cool water drawn from the main condenser. Lots of cool makeup water entered the reactor vessel. In boiling water reactors, the nuclear cores are designed such that their power levels drops as the reactor water temperature increases. This safety feature essentially applies a “brake” on the nuclear chain reaction as the power level rises. But there’s a flip side to this feature—the power level increases when the water temperature decreases. In about 25 seconds, the cool water rushing into the reactor vessel nearly doubled the reactor power level, from 6.5% to 13.5%.
The IPTE measures included briefings on how to conduct the slow, controlled shut down, but had skipped the parts of the procedure explaining what they were supposed to do in case things went wrong. When things went wrong, they did not take the proper response steps. Two wrongs definitely did not make it right.
Contrary to operating procedures, the workers rode out the uncontrolled reactor power rise and then resumed their controlled shut down efforts. The plant conditions met the criteria in the procedures calling for immediately depressing the two pushbuttons to scram the reactor, but the workers were not aware of the criteria and prescribed steps to take.
Safety by Intent
Despite these examples, IPTE efforts have successfully reduced the number of problems caused by workers manipulating the wrong equipment, manipulating the right equipment the wrong way, and skipping key steps in procedures.
But these examples, and others like them, show that IPTE is still AWIP—“a work in progress.” The ITPE measures can be made better. For example, workers at Millstone rehearsed on a control room simulator before their activity. But that rehearsal only tested whether the checkers were ready if the workers with hands-on tasks made a mistake. Those workers did not make any mistakes during the rehearsal and the checkers were essentially untested. And when they made mistakes during the real test, the checkers failed to realize mistakes had been made.
The rehearsal would have been more robust had the simulator instructors had workers deliberately reach for the wrong switch and/or flip the right switch in the wrong direction to test how well the checkers functioned as peers.
UCS’s Disaster by Design/ Safety by Intent series of blog posts is intended to help readers understand how a seemingly unrelated assortment of minor problems can coalesce to cause disaster and how effective defense-in-depth can lessen both the number of pre-existing problems and the chances they team up.