Sizewell B's Control System

The fall-out from Sizewell B report (Tony Collins of Computer Weekly)

In October 1993 Computer Weekly became the first publication in the UK to offer its readers the chance to acquire a secret government report in its entirety.

The decision raises issues of copyright and confidentiality, but if was felt the report was too important to be suppressed

Written by the Nuclear Installation Inspectorate (NII), part of the health and Safety Executive, the report asked whether the safety-critical protection software at the Sizewell B nuclear power station, due to open next year, can be trusted to perform predictably. It concluded that the Primary Protection System (PPS) software is safe - a contentious point in view of the report disclosing that the software had failed 52% of independent tests because of faulty test equipment.

Since making the report available to readers Computer Weekly understands that the NII is requiring Nuclear Electric, the operator of Sizewell B, to repeat some of the key tests.

Also, the fuel loading at Sizewell B has been delayed four months while Nuclear Electric tries to convince the NII that the reactor, associated plant. and the software, is safe.

The PPS software is designed to shut down Sizewell B in the event of a significant failure. A controlled shut-down is vital to prevent a major leak of radioactivity. More complex and. comprehensive than any in use at a UK nuclear power station, the software is backed up by a non computerised protection system. But this does not cover every fault.

Some safety-critical software experts believe that the Sizewell system is too complex to be adequately tested. But Nuclear Electric says the software, when considered with other measures, offers the best possible protection system.

In a letters-page special, Computer Weekly published some of the views of readers who have read the NII report. We hope to publish a reaction from Nuclear Electric in the New Year.

Considering the hardware

From Roger Hill,

The report does not seem to consider the hardware of the Primary Protection System (PPS), only the software. For example the Prom (Program Read Only Memory) which holds the executable code is validated against the source code, but there is no assessment of the mean time between Prom faults, or the failure modes of the Prom.

There is no indication of what state the control computer is left in when there it a processor fault.

There is no indication in the report whether failures in external peripherals have been or are going to be tested. For example, what happens to the software if several sensors short out?

It appears that no attempt has been made to test the PPS, all that has been tested is the software. The hardware running the PPS has a finite reliable life which is less than the design life of the reactor. The continued development of computer hardware means that the expertise to maintain and understand the operation of the current hardware could disappear within the operational life of the reactor. How many maintenance engineers today could look after an IBM 360 to the sort of standard required for critical systems safety?

Therefore within the design life of the reactor the computer control system will have to he replaced probably three or four times. The probability that these replacements could introduce errors does not appear to be considered. The initial control system will be checked as installed before the nuclear fuel is loaded.

Checking the replacement systems is not covered. Will the reactor be unloaded of all its nuclear fuel rods so that the replacement can go through full commissioning?

The probability that the process of commissioning a new computer control system on a live reactor will give rise to an accident does not appear to have been considered. Chernobyl was running down to annual engineering maintenance, not running live, when it blew up.

Another issue is that of the simulation model or test harness. The reactor system as a whole is too expensive and awkward to use as a means to test the computer control system. Quite rightly a simulation model of the reactor, known as the test harness, was built. This was used to test the computer system.

Unfortunately there is no record of how the simulation model was demonstrated to be an accurate representation of the reactor. Indeed all the errors so far checked have been ascribed to errors in the simulation model. The validation of the model and of the control system should have been carried out independently. By confusing the two, one outcome may be that the simulation model does not show up problems on the control system.

This is precisely the reason that program testing and system testing should be separated and should not use common data in commercial data processing.

Given the appalling consequences of getting a nuclear reactor control system wrong it is difficult to understand why only one simulation model was built.

Human factors are a key issue. One of the key problems identified by the inquiry into the Three Mile Island accident was the difficulty the operators experienced in making the correct diagnosis of their problem.

Information overload causes confusion. So, to test the effectiveness of Sizewell's PPS, it is necessary to test it with the operators. Their errors, confusion and mistakes are part of the system. This is recognised, for example. by the designers of aircraft cockpits, who now routinely simulate their design proposals with pilots. This aspect of testing and simulation is absent.

Since 52% of the tests failed the test data is going to be changed to fix these errors. Where looking at some errors in a set indicates a fault in the test harness then all faults in the set are assumed to be due to the test harness and are not further inspected. When the test harness is fixed the full set of tests will not be rerun. The full set of tests take six months to do. So where a bug in the harness obscured a bug in the control system so that it was not detected in the first set of tests, then this bug may not be picked up on the limited re-test,

Personally I would not approve the PPS system going live even if it were just printing invoices. You can lose customers if you send them too many wrong invoices.

Roger Hill Computer Weekly November 1993.

Roger Hill's Published Papers