Chair Force Engineer

Saturday, January 27, 2007

Rush to Disaster

January 27th marks the 40th anniversary of the Apollo I fire. Tomorrow is the 21st anniversary of the Challenger disaster, and Thursday will mark four years since Columbia was lost. Taken together, these losses represent NASA's greatest failings and give us all a reason to think about how we can do things better.

In the aftermath of the three disasters, schedule pressures were cited as major factors in each instance. Management also took a great degree of blame. The message from the investigators is that, regardless of a system's technical limitations, it can be made safe through sound management processes.

Prior to Apollo I, the quest to land a man on the moon "before the decade was out" was speeding ahead full-throttle. If NASA's original schedule had been adhered to, a moon landing could have been accomplished in 1967 or 1968. The problem was that the launch of AS-204 (retroactively named Apollo I) had been delayed from 1966 into February 1967. Even at the time of the fire, the February launch date looked susceptible to further delays. In the end, astronauts Grissom, White & Chaffee died in what was supposed to be a "routine" test of a shoddily-designed spacecraft in a poorly-designed simulation. The aftermath of the accident was a thorough redesign of the Apollo Block II spacecraft. The mission profile that "Apollo I" should have flown was taken by Apollo 7 in October 1968. The delay was 20 months. Had NASA and North American Rockwell used that 20 months to design the capsule and the test procedures correctly, the moon may have been reached without loss of life and still before December 1969.

Schedule pressure was most apparent with the Challenger disaster. The shuttle was falling far short of flight rates that were required to make the vehicle economical (a flight rate that could never have been supported by the hardware in the first place.) The year 1986 had already manifested missions that could have proven disastrous, such as the first shuttle launch from Vandenberg and the deployment of the Galileo probe with the Centaur upper stage. The conditions for disaster had already been set byn the massive schedule slippage and the high expectations that were placed on the shuttle program. The final straw came when mid-level managers rejected arguments from engineers that the O-rings would fail in the freezing weather of Janaury 28, 1986. The flight just couldn't sit on the pad for another day, or so the managers thought. Because they launched the shuttle on that particular delay, the result was a lost crew and a 32-month delay before another shuttle could be launched. The disaster forced a wise and necessary rethink of the national launch strategy. New versions of the Delta, Atlas and Titan were developed in order to launch all payloads that did not rely on the shuttle's unique capabilities. The shuttle program adopted a flight rate that was more consistent with the budget and hardware limitations.

When the Columbia Accident Investigation Board released its final report, it placed what I felt to be an inflated importance on the role of schedule pressure. At the time of Columbia's Janaury 16, 2003 launch, the space station was scheduled to be "US Core Complete" by Feb 2004. This date was very important to NASA administrator Sean O'Keefe, because he had gotten the job based on his commitment to avoiding further cost overruns and schedule delays on the space station. The board reasoned that this tight schedule deterred NASA management from redesigning the external tank after a piece of foam hit an SRB during the STS-112 mission. I feel that the schedule was a much smaller factor when compared to NASA's belief, borne out by Boeing's simulations, that falling foam was not a threat to the mission. Foam shedding had been observed throughout the shuttle program and was always felt to be a manageable risk. It was a lesson learned the hard way, with the loss of Columbia's crew. Unlike Challenger, this was not a problem that could be fixed by placing constraints on the launch window. It required substantial redesign of the tank to avoid foam-shedding, and modification of the orbiter to detect damage to the heat shield. It would be another 30 months before the space shuttle flew again, due to the redesign of the external tank. Following the return to flight, it required almost 12 more months to modify the external tank in a way that reduced foam-shedding to acceptable levels. The space station will not be "US Core Complete" until 2008 at the earliest.

We should take many lessons from NASA's disasters. Most importantly, safety should be the main factor in determining when we do things, rather than the fantasy schedules that are political in nature and written by kool-aid drinkers. It's better to take a delay that doesn't kill anybody than to take the bigger delay that inevitably follows a fatal accident. Further, testing and simulation must be well-thought-out and realistic. Managers must never be afraid to ask for more simulations if they're unsure of the mission's success at a critical decision point. Finally, lives lost without some lessons that are applied to future missions are lives wasted. Future manned spaceflight must always remember the sacrifices of Apollo I, Challenger and Columbia. They were sacrifices that needn't have been made, if not for the human tendency to react to disaster instead of anticipating it.