RAMP Simulation Software for Modelling Reliability, Availability and Maintainability
RAMP Simulation Software for Modelling Reliability, Availability and Maintainability (RAM) is a computer software application developed by WS Atkins specifically for the assessment of the reliability, availability, maintainability and productivity characteristics of complex systems that would otherwise prove too difficult, cost too much or take too long to study analytically. The name RAMP is an acronym standing for Reliability, Availability and Maintainability of Process systems.
RAMP models reliability using failure probability distributions for system elements, as well as accounting for common mode failures. RAMP models availability using logistic repair delays caused by shortages of spare parts or manpower, and their associated resource conditions defined for system elements. RAMP models maintainability using repair probability distributions for system elements, as well as preventive maintenance data and fixed logistic delays between failure detection and repair commencement.
RAMP consists of two parts:
- RAMP Model Builder. A front-end interactive graphical user interface (GUI).
- RAMP Model Processor. A back-end discrete-event simulation that employs the Monte Carlo method.
- 1 RAMP Model Builder
- 1.1 Elements
- 1.2 Deterministic elements
- 1.3 Q values
- 1.4 Groups
- 1.5 Group types
- 1.6 Element states
- 1.7 Element resource and repair conditions
- 1.8 Repair policy
- 1.9 Common mode failures
- 1.10 Criticalities
- 1.11 Time units
- 1.12 Element types
- 1.13 Import functionality
- 2 RAMP Model Processor
- 3 History of RAMP
- 4 Uses of RAMP
- 5 References
RAMP Model Builder
The RAMP Model Builder enables the user to create a block diagram describing the dependency of the process being modelled on the state of individual elements in the system.
Elements are the basic building blocks of a system modelled in RAMP and can have user-specified failure and repair characteristics in the form probability distributions, typically of Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR) values respectively, chosen from the following:
- Weibull: Defined by scale and shape parameters (or optionally 50th and 95th percentiles for repairs).
- Negative exponential: Defined by mean average.
- Lognormal: Defined by median average and dispersion (or optionally 50th and 95th percentiles for repairs).
- Fixed (Uniform): Defined by a maximum time to failure or repair.
- Empirical (user-defined): Defined by a multiplier.
Elements can represent any part of a system from a specific failure mode of a minor component (e.g. isolation valve fails open) to major subsystems (e.g. compressor or power turbine failure) depending on the level and detail of the analysis required.
RAMP allows the user to define deterministic elements which are failure free and/or are unrepairable. These elements may be used to represent parameters of the process (e.g. purity of feedstock or production demand at a particular time) or where necessary in the modelling logic (e.g. to provide conversion factors).
Each element of the model has a user-defined process 'q value' representing a parameter of interest (e.g. mass flow, generation capacity etc.). Each element is considered to be either operating or not operating and has associated performance values q = Q or q = 0 respectively. The interpretation of each 'q value' in the model depends on the parameter of interest being modelled, which is typically chosen during the system analysis stage of model design.
Elements with interacting functionality can be organised into groups. Groups can be further combined (to any depth) to produce a Process Dependency Diagram (PDD) of the system, which is similar to a normal reliability block diagram (RBD) commonly used in reliability engineering, but also allows complex logical relationships between groups and elements to permit a more accurate representation of the process being modelled. The PDD should not be confused with a flow diagram since it describes dependency, not flow. For example, an element may appear in more than one position in the PDD if this is required to represent the true dependency of the process on that element. Groups may also be shown in full or may be compressed to allow the screen to show other areas to greater resolution.
Each group can be one of eleven group types, each with its own rule for combining 'q values' of elements and/or other groups within it to produce a 'q value' output. Groups thus define how the behaviour of each element affects the reliability, availability, maintainability and productivity of the system. The eleven group types are divided into two classes:
Five 'Flow' group types:
- Minimum (M): qM = min[q1, q2,...qn]
- Active Redundant (A): qA = min[Rating, (q1 + q2 + ... + qn)] unless qA < Cut-off, then qA = 0
- Standby Redundant (S): qS = as for Active Redundant, but where the first component is always assumed to be duty equipment.
- Time (T): qT = 0 if component with 'q value' q1 is in a "down" state when time through mission t < t0, otherwise qT = q1 + ... + qm if component with 'q value' q1 is in an "up" state when time t ≥ t0 + (m-1) x Time Delay, where m = 1 to n.
- Buffer (B): if the buffer is not empty qB = q2 else qB = min[q1,q2], where the buffer empties as output if component with 'q value' q2 is in an "up" state with level at time 0 = Initial Level, otherwise level at time t = level at time (t-1) - (q2 - q1), and the buffer fills as input if component with 'q value' q2 is in a "down" state with level at time 0 = Initial Level, otherwise level at time t = Capacity if level at time (t-1) + q1 > C, otherwise level at time t = level at time (t-1) + (q2 - q1). Buffer input and output may also be limited by buffer constraints.
Six 'Logic' group types:
- Product (P): qP = q1 x q2 x ... x qn
- Quotient (Q): pQ = q1 / q2
- Conditionally Greater Than (G): if q1 > q2 then qG = q1 else qG = 0
- Conditionally Less Than (L): if q1 < q2 then qG = q1 else qG = 0
- Difference (D): max[q1 - q2, 0]
- Equality (E): q1 if q1 lies outside the range PA to PB, q2 if q1 lies inside the range PA to PB
Three group types (Active Redundant, Standby Redundant and Time) are displayed in parallel configurations (vertically down the screen). All others are displayed in series configurations (horizontally across the screen).
Six group types (Buffer, Quotient, Conditionally Greater Than, Conditionally Less Than, Difference and Equality) contain exactly two components with 'q values' q1 and q2. All others contain two or more components with 'q values' q1, q2 to qn.
An element may be in one of five possible states and its 'q value' is determined by its state:
- Undergoing preventive maintenance (q = 0).
- Being repaired following failure, including queueing for repair (q = 0).
- Failed but undetected, dormant failure (q = 0). (e.g. standby equipment unavailable in the event of failure of duty equipment. Thus a problem may not be apparent until a failure of the duty equipment occurs.)
- Up but passive, available but not being used (q = 0). (e.g. standby equipment available in the event of failure of duty equipment.)
- Up and active, being used (q = Q > 0). (i.e. operating as intended.)
Occurrence of a state transition for an element is determined largely by the user-defined parameters for that element (i.e. its failure and repair distributions and any preventive maintenance cycles).
Element resource and repair conditions
There is often a time delay between an element failing and the commencement of repair of the element. This may be caused by a lack of spare parts, the unavailability of manpower or the element cannot be repaired due to dependencies on other elements (e.g. a pump cannot be repaired because the isolating valve is defective and cannot be closed). In all of these cases, the element must be queued for repair. RAMP allows the user to define multiple resource conditions per element, all of which must be satisfied to allow a repair to be commenced. Each resource condition is one of five types:
- Repair Trade: a specified number of a repair trade must be available.
- Spare: a specified number of a spare part must be available.
- Group Q Value: a specified group must satisfy a condition regarding its 'q value'.
- Buffer Level: a specified buffer must satisfy a condition regarding its level.
- Element State: a specified element must satisfy a condition regarding its state.
Repair trades repair condition
Repair trades can be specified for the repair of any element, and they represent manpower in the form of a set of skilled maintenance workers with a particular trade. A repair trade can be used for the duration of an element repair (i.e. logistic delay plus a time value drawn from the element repair distribution). On completion of the repair, the Repair Trade becomes available to repair another element. the number of repairs which can be performed simultaneously for elements requiring a particular repair trade depends on the number of repair trade resources allocated and the number of that repair trade specified as a requirement for the repair.
Spares repair condition
If a spare part is required for an element repair, then the spare part is withdrawn from stock at the instant the repair commences (i.e. as soon as the element leaves the repair queue). The maximum number of spare parts of each type that may be held in stock is user-defined. The stock may either be replenished periodically at a user-defined time interval, or when the stock falls below a user-defined level, in which case RAMP allows a user-defined a time delay that must occur between reordering and the actual replenishment of the stock.
Group Q value repair condition
RAMP allows the user to specify that an element cannot be repaired until the 'q value' of a nominated group satisfies one of six conditions (>, ≥, <, ≤, =, ≠) relative to a user-defined non-negative real number repair constraint. These conditions may be used to model certain rules in a system (e.g. a pump cannot be repaired until a tank is empty).
Buffer level repair condition
Specifying a buffer level constraint means that preventive maintenance of an element can be restricted until the buffer level of a nominated buffer group satisfies one of six conditions (>, ≥, <, ≤, =, ≠) relative to a user-defined non-negative real number repair constraint. These conditions may be used to model certain rules in a system (e.g. it may be a requirement for maintenance of a submersible pump that the tank it is in should be empty before repair work commences).
Element state repair condition
RAMP allows the user to specify that an element cannot be repaired until the state of another nominated element satisfies one of six conditions (>, ≥, <, ≤, =, ≠) relative to a user-defined non-negative real number repair constraint.
Each element has user-defined parameters that can affect how it is repaired:
- Logistic repair delay: A time period that must elapse before a repair can start on an element. It is a fixed time that is added to the repair time sampled from the user-defined repair probability distribution for the element. Typically, it represents a combination of the time taken for the repair team to reach the site of failure, time to isolate the failed item, and time taken to obtain the required spare part from store.
- Repair 'good-as-new' or 'bad-as-old': Refers to the failure rate of an element rather than its 'q-value'. By default an element is restored to 'good-as-new' following repair, but there is an option to toggle a 'bad-as-old' state that simulates a quick-fix equivalent to restoring the element to the beginning of the wear-out phase of a Weibull bathtub curve, should a Weibull probability distribution with shape greater than one be used for repairs.
- Repair priority: Used only if element resource and repair conditions are specified (i.e. it is only used if an element has to queue for repair rather than going directly for repair). The purpose of this field is to help determine the sequence in which elements are drawn from the repair queue as resources become available for element repair. Elements are repaired according to their repair priority, where 1 is highest priority, 2 is next highest, and so on. Elements with the same priority are repaired on a 'first come first served' basis.
In addition, each element in a Standby Redundant group has more parameters that can effect how it is repaired:
- Passive failure rate factor: Factor by which the element failure rate is multiplied when operating in the passive state as opposed to the active state. By default this factor will be one and typically between zero and one, indicating a lower passive failure rate than active failure rate.
- Probability of switching failure: Percentage probability that the element will fail when switched from the passive state into the active state. If such a switching failure occurs, the element must be repaired in the normal way before it can be used again.
- Startup delay: Startup of the element going from a passive state to an active state is delayed by a specified time.
RAMP allows the user to model preventive maintenance for each system element by cycles expressed using the three parameters 'up-time'. 'down-time' and 'down-time' start time. RAMP also has an option to toggle 'intelligent preventive maintenance' on each system element, which attempts to improve system performance by doing preventive maintenance when the element is already in 'down-time' for other reasons.
Common mode failures
Common mode failures (CMFs) that cause a number of elements to fail at the same time (e.g. due to the occurrence of a fire or some other catastrophic event, or the failure of a power supply that provides power to several separately defined elements). RAMP allows the user to define CMFs by stating the set of affected elements and the frequency distribution for occurrences of the CMF. When a CMF occurs, any elements which are affected by that particular CMF are placed in the failed state and must be repaired, being queued for repair if necessary. Any elements failed by a CMF will be repaired according to the repair distribution defined for that element. Elements which are already being repaired, are in the repair queue, or are undergoing preventive maintenance remain unaffected by the occurrence of an associated CMF.
The criticality of an element is a measure of how much the element has affected the 'q value' (i.e. performance) of the group to which it belongs. Elements with a high criticality cause more 'down-time' or unavailability on average and are thus critical to the performance of the group. The criticality of an element may vary according to the level of the group (e.g. a motor failure may have a very high criticality for a group that contains failure modes for one pump, but a very low criticality for a group that contains several redundant pumps).
RAMP allows the user to set the time unit of interest, according to scale and fidelity considerations. The only requirement is that time units should be used consistently across a model to avoid misleading results. Time units are expressed in the following input data:
- Element failure probability distributions.
- Element repair probability distributions.
- Element logistic delay times (before repair).
- Element preventive maintenance 'up-times', 'down-times' and start points.
- Common mode failure probability distributions.
- Percentile times in empirical probability distributions (for failure or repair).
- Delay times in Time groups.
- Spare part replenishment intervals or re-order delay times.
- Rolling average span and increment.
- Histogram 'down-times'.
- Simulated time period of interest.
Elements that are assumed to have the same failure and repair characteristics and share a common pool of spare parts can be assigned the same user-defined element type (i.e. pump, motor, tank etc.). This allows for faster construction of complex systems containing many elements that are similar in function since the entry of element data does not need to be repeated for such elements.
Previously built systems can be imported as subsystems of the system currently displayed. This allows for faster construction of complex systems containing many subsystems since they can be constructed in parallel by multiple users before being imported into a common system.
RAMP Model Processor
The RAMP Model Processor mimics the system operating over the time period of interest - known in RAMP as a mission - by sampling failure and repair times from probability distributions (with probabilities drawn from a pseudo-random number generator) and combining with other data defined in the RAMP Model Builder to determine state transition events for each element in the model. The simulation uses discrete events that are queued in chronological order with each event being processed in turn to determine the states and thus the 'q values' of every element in the model at that discrete point in time. Group combination rules are used to determine the 'q values' at successively higher levels of groups, culminating in 'q values' of the outermost groups that when averaged over the events of the simulation typically provide performance measures of the system, which are output in model results in terms of the chosen parameters of interest.
By running enough missions over the same time period of interest (different possible histories from the same starting point), RAMP can be used to generate statistically significant results that establish the likely distribution of the user-defined parameters of interest and thus objectively assess the system, with the confidence bands on the results dependent on the number of missions simulated. On the other hand, by running a mission length that is long in comparison with the failure frequencies and repair times, and simulating only one mission, RAMP can be used to establish the steady-state performance of the system.
History of RAMP
RAMP was originally developed by Rex Thompson & Partners Ltd. in the mid-1980s as an availability simulation program, primarily used for plant and process modelling. The ownership of RAMP was transferred to T.A. Group upon its founding in January 1990, and then to Fluor Corporation when it acquired T.A. Group in April 1996, before passing to the Advantage Technical Consulting business of parent company Advantage Business Group Ltd., formed in February 2001 by a management buy-out of the consulting and information technology businesses of Fluor Corporation, operating in the transport, defence, energy and manufacturing sectors. RAMP is currently owned by Atkins following its acquisition of Advantage Business Group Ltd. in March 2007. Extensive redevelopment by Atkins of the original RAMP application for DOS has produced a series of RAMP applications for the Microsoft Windows platform, with the RAMP Model Builder written in Visual Basic and the RAMP Model Processor written in FORTRAN.
Uses of RAMP
Due to its inherent flexibility, RAMP is now used to optimise system design and support critical decision making in many sectors RAMP provides the capability to model many factors that may affect a system such as changes in specification or procurement contracts, 'what if' studies, sensitivity analysis, equipment redundancy, equipment criticality, delayed failures, as well as allowing the generation of results that can be exported for failure mode, effects and criticality analysis (FMECA) and cost-benefit analysis.
- Reliability, Maintainability and Risk: 7th Edition. Elsevier. David J. Smith BSc PhD CEng FIEE FIQA HonFSaRS MIGasE.