FAULT-TOLERANT ONBOARD PLD-SYSTEMS:
A SPACE-STRUCTURAL SIMULATION AND METHODS OF ADAPTATION
USHAKO V ANDREY ALEXANDR O VICH, KHARCHENKO VYACHESLAV SERGEYEVICH
National Aerospace University named after N.E. Zhukovsky “KhAI”, Kharkiv, Ukraine. E-mail: A.Ushakov@csac.khai.edu; khaks@skynet.kharkov.com
Introduction
Modern aerospace systems lay down exacting requirements to reliability and survivability of on-board digital systems [ 1 -4]. Programmable logical devices (PLD) represent convenient tools for design of reliable structures in one chip [5-7]. There is a need for reliability estimation and durability forecast of such PLD-based systems taking into account space factors which cause multiple failures of chip elements [8-11]. The ability of PLD dynamic reconfiguration allows for implementation of different variants of redundancy and adaptation to failures with different multiplicity [12-14].
The purpose of the paper is to present the overview of research results and practical developments of methods of fault-tolerance guarantee and PLD-models based on space-structural approach.
The suggested set of models
The space-structural system of analytical and simulation modeling represents the tool for handle such issues [15].
Analytical space-structural model allows for estimation of the condition of PLD structures (to calculate the probability of operating state) on the time interval towards the end of which a number of elements failures were revealed. The modeling took into account the following: the chip type, the architecture of the redundant system, the allocation and area of different system parts, the configuration and multiplicity of failing cells.
The information, about nature and consequences of failures which is laid and obtained during the proj ect development in the computer-aided design system MAX+PLUS II and also database of fault-tolerant PLD-structures, are used as source data for the simulation model. In such a way a set of program models is created which is necessary for simulation modeling.
The initial phase of modeling is the software model of PLD. F irst of all we make the chip logical floorplan according to the database of chips. The elements of chip logical floorplan can be elementary cells with known capacity: logic cells, cells of embedded RAM and also fixed service PLD-parts. The next phase of modeling is the program model of the PLD-based digital device. According to prepared file with project routing the usable cells, their links with other cells as well as pins are outlined on the obtained logical floorplan.
The failure emulator in physical model of PLD-based digital device is built based on information about failure nature. The emulator is adjusted either to single failures or to multiple independent failures or to multiple cluster failures with spot shape or multiple cluster failures of three-dimensional converging pyramidal shape.
The PLD program model testing phase is based on results of analysis of previous models and database of fault-tolerant PLD-based digital devices. The series of trials has been carried out by using imitation offailures up to a complete fault of one channel and digital device as a whole.
Database offault-tolerant structures
At primary phase of analytical modeling the redundant structures are being considered which are implemented in one chip. The database is developed by making use of the following classification attributes [16, 17]:
1) different methods of redundancy. Analytical modeling has been conducted for non-redundant (with one channel) duplex and triplex (majority) structures [18-20];
2) presence or absence of checking, diagnosis and reconfiguration means;
3) individual or common inputs/outputs to different functional areas of structures.
While analytical modeling has been considered 6 following structures (Tables 1-3).
Analytical modeling assumptions. Controller of chip configuration hasn’t been considered while analytical modeling because it functions only during configuration loading and doesn’t affect the normal operation phase of the chip.
As failure objects (elementary logical structure offailure) the following cell types are considered:
1) the configurable logic block (CLB) with the associated configuration cell;
2) input/output blocks with attached chip pins;
3) distributed RAM cells.
CLBs have been considered alongside with configuration cells because configuration cell latches are distributed among CLBs. B oth CLB and configuration cell failures are regarded as equivalent events. If the failing block is situated in the area of the internal processing channel then this event causes the channel failure. If the failing block is situated out of IPC area then this event doesn’t affect the work capacity of the device. Events which are connected with failures of pins and I/O blocks are also equivalent, as constructively both of these elements are connected serially.
Failure consequences of anyone of the three elementary PLD structures which are accepted for analytical modeling are equivalent. It is proved by the fact that only the reaction of redundant structure with relation to its availability has been considered.
The failures distribution law of any cell is uniform.
100
R&I, 2003, Ns 3
Table 1. The description of non-redundant structures
Structure type_____________
Structure 1 - nonredundant structure with checking, diagnosis and reconfiguration means and common set of pins which feeds information to checking, diagnosis and reconfiguration means
Structure 2 - non-redundant structure with checking, diagnosis and reconfiguration means and individual set of pins which feeds information to checking, diagnosis and reconfiguration means
Graphic symbol
Inputs
nin
Reconfiguration control inputs/outouts Nx>N
Inputs NIN
Inputs NIN
Reconfiguration control inputs/outputsNCON
Table 2. The description of duplex structures
Structure 3 - duplex structure with checking, diagnosis and reconfiguration means, individual set of pins to each information processing channel, internal duplication of data inputs to checking, diagnosis and reconfiguration means, and individual checking/reconfiguration pins to each channel
Structure 4 - duplex structure with checking, diagnosis and reconfiguration means, individual set of pins to each information processing channel, individual set of input pins to each data processing channel, individual set of input pins to checking, diagnosis and reconfiguration means, and individual
checking/reconfiguration pins to each channel
Inputs NIN
Inputs N
Reconfiguration
control
inputs/outputs
N_
Inputs NIN
Inputs N
Reconfiguration
control
inputs/outputs
N_,
User-customizable logic area
Data processing channel 1
Internal processing
channel N
ch
Q.
Checking, diagnosis and reconfiguration means NSP
Data processing channel 2
Internal processing
channel N
ch
Checking, diagnosis and reconfiguration means NSP
R&I, 2003, N 3
101
The PLD chips of different manufacturers with equal logical capacity are not distinguishable under analytical modeling, because the routing resources haven’t been considered.
The structure of model
Typical analytical model includes:
1) a spatial model of structure with failure variants;
2) an analytical expression for reliability indices calculation;
3) the research result of these reliability indices as dependant on system parameters.
As a result of modeling the following was obtained:
1) the dependency of non-failure operation probability from areas which are taken by different FPGA structure constituents;
2) the dependency of non-failure operation probability from structure redundant device;
3) the dependency of non-failure operation probability from failure multiplicity and it space placement;
4) the comparison of triple-channel majority and duplex structure in dependency from FPGA resource capacity for special-purpose part of the structure.
Results of modeling
An example of non-failure operation state dependency from failure numbers is given.
The choice of the chip for realization of any redundant structure has been carried out by taking into account the structure which occupies the biggest logical capacity of the chip (Table 4). At the same time the estimation of structure complexity may be made according to the number of usable logic cells as well as to the inputs/outputs number. Comparison has been carried out provided that all internal processing channels take up equal shares of the structure’s capacity for one as well as different structures. According to the number of logical cells the most complicated structures are any majority structures because they have three channels.
A second level of complicity is duplex structure. We also assume that one internal processing channel occupies between 15% and 25% of chip’s logical cells and checking, diagnosis and reconfiguration means occupy between 10% and 25% of channel. I.e. as long as checking, diagnosis and reconfiguration means occupy 50% of logical cells of the channel then that duplex structure will be the same in complexity as the majority structure. However, the majority voter will exceed the channel switch in complexity. Both areas arcwise depend onthe number of channels and also on the number of device pins. According to the number of usable inputs/outputs the duplex structure with individual inputs of internal processing channels and individual inputs of checking, diagnosis and reconfiguration means will occupy the biggest amount of chip capacity. T aking into account the number of pins, the one set of inputs or outputs will occupy between10% and 15% of chip’s pins. The maj ority voter will take 50% of chip’s pins, the channel switch 75% correspondingly. Reconfiguration control inputs/outputs and checking/diagnosis outputs will occupy fixed 4 cells. The number of outputs will be fixed because analytical formulas take into account only those variants when the area occupies 4 and more cells.
The data about logical capacity of Xilinx CPLD chips XC9500XL and FPGA chips Virtex-II has been used while modeling. The capacity of XC95288XL chip is equal 288 logical cells and 192 pins. The capacity ofXC2V8000 chip is equal 104832 logical cells and 1108 pins.
The analytical formulas of non-failure operation probability with varying failure numbers for accepted data have been obtained.
The plots of dependency of probability of non-failure operation from the number of failures with different logical capacity of chip for checking, diagnosis and reconfiguration means, and for XC95288XL chip are shown on figures 1-6.
Conclusion. The following results have been obtained. The duplex structure on FPGA architecture has the highest probability of non-failure operation under condition that
102
R&I, 2003, N 3
Table 4. The sharing of the ca| pacity of the chip between different functional areas of the structures
Experiment number Chip type Number of cells Number of pins Number of logical recourses for functional area of the structures
1 XC95288XL 288 192 Internal processing channel 25%- 72 cells
Set of Inputs or outputs 15% - 29 pins
Checking, diagnosis and reconfiguration means 10% - 8 cells
Checking, diagnosis and reconfiguration means 25% - 18 cells
Channel switch - 15 cells
Majority voter - 22 cells
2 XC95288XL 288 192 Internal processing channel 15% - 44 cells
Set of Inputs or outputs 15% - 29 pins
Checking, diagnosis and reconfiguration means 10% - 5 cells
Checking, diagnosis and reconfiguration means 25% - 11 cells
Channel switch - 15 cells
Majority voter - 22 cells
3 XC95288XL 288 192 Internal processing channel 25% - 72 cells
Set of Inputs or outputs 10% - 20 pins
Checking, diagnosis and reconfiguration means 10% - 8 cells
Checking, diagnosis and reconfiguration means 25% - 18 cells
Channel switch - 10 cells
Majority voter - 15 cells
4 XC2V8000 104832 1108 Internal processing channel 25% - 26208 cells
Set of Inputs or outputs 15% - 167 pins
Checking, diagnosis and reconfiguration means 10%- 2621 cells
Checking, diagnosis and reconfiguration means 25% - 6552 cells
Channel switch - 84 cells
Majority voter - 126 cells
5 XC2V8000 104832 1108 Internal processing channel 15% - 15725 cells
Set of Inputs or outputs 15% - 167 pins
Checking, diagnosis and reconfiguration means 10% - 1573 cells
Checking, diagnosis and reconfiguration means 25% - 3932 cells
Channel switch - 84 cells
Majority voter - 126 cells
6 XC2V8000 104832 1108 Internal processing channel 25% - 26208 cells
Set of Inputs or outputs 10% - 111 Bbmoga
Checking, diagnosis and reconfiguration means 10% - 2621 cells
Checking, diagnosis and reconfiguration means 25% - 6552 cells
Channel switch - 56 cells
Majority voter - 84 cells
checking, diagnosis and reconfiguration means are less then 35% of channel’s logical capacity. At the same time, the duplex structure on CPLD architecture is inferior to the majority structure with a common set of inputs as long as two failures take place under condition that checking, diagnosis and reconfiguration means are less then 12,5n20% of channel’s logical capacity. At 16% the majority structure yields to non-redundant structure with a common set of inputs provided three failures. However, the majority structures have the significant advantage if there is been a single failure. The reduction of internal processing channel area leads to the increased probability of non-failure operation. It is especially significant for majority and non-redundant structures with the common input. The reduction of inputs/ outputs pins leads to majority structure advantage. Hence,
R&I, 2003, N 3
the modern FPGA elemental base and existent redundancy techniques can significantly increase the fault-tolerance of airborne digital systems.
Reference: 1. HabincS., SinanderP. ASIC Design and Manufacturing Requirements. European Space Agency and European Space Research and Technology Centre. WDN/Ps/700, Issue 2, October 1994. 2. Fernandez-LeynA., PouponnotA., HabincS. ESA FPGA Task Force: Lessons Learned // Proceedings ofMAPLD Conference, Maryland, USA, 2002. 3. ZieglerJ.F, CurtisH.W.,MuhlfeldH.P. and others. IBM experiments in soft fails in computer electronics (1978-1994). Journal of Research and Development. Terrestrial Cosmic Rays and Soft Errors. Volume 40, Number 1, 1998. 4. NASA Preferred Reliability Practices. Environmental F actors. Practice no. PD-EC-1101. http://www.hq.nasa.gov. 5. Reiner Hartenstein. A Decade of Reconfigurable Computing: a V isionary Retrospective / / Proc. of Design, Automation and Test in Europe Conference
103
Figure 1. The dependency of probability of non-failure operation from the number of failures with checking, diagnosis and
Figure 2. The dependency of probability of non-failure operation from the number of failures with checking, diagnosis and reconfiguration means which occupy 25% of the logical capacity of the chip (experiment 1)
(DATE-2001), Munich, Germany. March 2001. P. 642-649. 6. Neil W. Bergmann, Anwar S. Dawood. Reconfigurable Computers in Space: Problems, Solutions and Future Directions // Proceedings of MAPLD Conference, Maryland, USA, 1999. 7. Weigand D., Harlacher M. Design of a Radiation-Tolerant Low-Power Transceiver // Proc. ofMAPLD Conference, Maryland, USA, 2001. 8. Peter Alfke, RickPadovani. Radiation T olerance ofHigh-Density FPGAs // Proc. ofMAPLD Conference, Maryland, USA, 1998. 9.
Earl Fuller, Michael Caffrey, Phil Blain, Carl Carmichael, Noor Khalsa, Anthony Salazar. Radiation test results of the V irtex FPGA and ZBT SRAM for Space Based Reconfigurable Computing // Proc. ofMAPLD Conference, Maryland, USA, 1999. 10. David G. Mavis, Paul H. Eaton. SEU and SET Mitigation Techniques for FPGA Circuit and Configuration Bit Storage Design // Proc. of MAPLD Conference, Maryland, USA, 2000. 11. Earl Fuller, Michael Caffrey, Anthony Salazar, Carl Carmichael, Joe Fabula.
R&I, 2003, N 3
104
Figure 3. The dependency of probability of non-failure operation from the number of failures with checking, diagnosis and reconfiguration means which occupy 10% of the logical capacity of the chip (experiment 2)
Figure 4. The dependency of probability of non-failure operation from the number of failures with checking, diagnosis and reconfiguration means which occupy 25% of the logical capacity of the chip (experiment 2)
Radiation Characterization, and SEU Mitigation, of the Virtex FPGA for Space-Based Reconfigurable Computing. http:// www.xilinx. com. 12. Abderrahim Doumar, MohammedBarboucha andHideo Ito. F ault T olerance SRAM based FPGAs by shifting the Configuration Data // Proc. of MAPLD Conference, Maryland, USA, 1999. 13. SandorP. Fekete, EkkehardKohler, Jurgen Teich. Optimal FPGA Module Placement with Temporal Precedence Constraints, Proc.of Design Automation and Test in Europe Conference (DATE-2001), Munich, Germany, March 2001. P.
657-665. 14. Kharchenko V.S., Sklyar V.V. On-Board Device and System Architectures with the Version-Threshold Adaptation to Hardware and Software Faults // Proc. of MAPLD Conference, Maryland, USA, 2002. 15. Ushakov A.A., Kharchenko V.S. The technique of simulation and choice of PLD-based reliable embedded control system structures. Kharkiv State Technical University of Agriculture, Ukraine, Kharkiv, 2002. P. 401-405. 16. Kharchenko V.S., Tarasenko V.V. Multiversion Design Technologies of Onboard Fault-tolerant FPGA Devices // Proc. ofMAPLD Conference,
R&I, 2003, N 3
105
12 3 4
Number of failures
Figure 5. The dependency of probability of non-failure operation from the number of failures with checking, diagnosis and reconfiguration means which occupy 10% of the logical capacity of the chip (experiment 3).
Figure 6. The dependency of probability of non-failure operation from the number of failures with checking, diagnosis and reconfiguration means which occupy 25% of the logical capacity of the chip (experiment 3)
Maryland, USA, 13-15 September, 2001. 17. Ushakov A.A., Kharchenko V.S. Requirements and architectural solutions ofPLD-based reliable embedded real-time control systems // Proc. of International Scientific and Technical Conference “Integrated Computer Technology in Engineering”, Ukraine, Kharkiv 2002. P. 102-106. 18. Carl Carmichael. Triple Module Redundancy Design.
Techniques for Virtex FPGAs. XAPP197 (v1.0) November 1,2001. 19. Phil Brinkley, Avnet and Carl Carmichael. SEU Mitigation Design Techniques for the XQR4000XL. XAPP181 (v1.0) March 15, 2000. 20. Andraka R. J., Brady P.E., Brady J. L. A Low Complexity Method for Detecting Configuration Upset in SRAM Based FPGAs // Proc. of MAPLD Conference, Maryland, USA, 20.
106
R&I, 2003, N 3