Applying Fmea To Software
Author(s) -
Rick Homkes,
Henry Kraebber,
Donna Evanecky
Publication year - 2020
Language(s) - English
Resource type - Conference proceedings
DOI - 10.18260/1-2--14815
Subject(s) - failure mode and effects analysis , computer science , process (computing) , reliability (semiconductor) , quality (philosophy) , reliability engineering , software , risk analysis (engineering) , software engineering , engineering , medicine , power (physics) , philosophy , physics , epistemology , quantum mechanics , programming language , operating system
Failure Mode and Effect Analysis (FMEA) is a well-known industry technique for improving the reliability, quality and safety of products and processes. It “can be described as a systematic group of activities intended to: (a) recognize and evaluate the potential failure of a product/process and the effects of that failure, (b) identify actions that could eliminate or reduce the chance of the potential failure occurring, and (c) document the entire process.” 1 The focus of FMEA is on the design of products and processes. FMEA provides designers with a formal process and “disciplined techniques to identify and minimize the impact of design concerns.” 1 It is intended to be used early in the development process, not after problems have become evident. The FMEA procedure utilizes cross-functional teams with open discussion and communications. Effective preventative and corrective actions based on the FMEA findings are essential, and can be very valuable. FMEA studies without proper action on the findings and recommendations will produce little value. FMEA, however, has generally not been applied to software. As we continue to move to ubiquitous computing, however, software will be more completely integrated into products that consumers use every day. This increase in the use of embedded systems is also matched by the increase in their importance. The effects of some form of failure to the average consumer have thus gone from inconvenience or economic harm to actual injury or death. This means that FMEA, along with other techniques such as Preliminary Hazard Analysis (PHA), Reliability Block Diagrams (RBD), and Fault Tree Analysis (FTA), need to be an integral part of design, development, manufacturing, and maintenance. These techniques should apply not just to individual products, but also to complete systems. In other words, these techniques should not be used with only the hardware of an Antilock Braking System (ABS) controller, but also to the ABS software, the entire braking system and the entire vehicle. This paper starts by making an overview of PHA, RBD, FTA, and FMEA. It then investigates how these methods could be incorporated into the courses that cover programming and software engineering. Review of Quality / Reliability / Safety Techniques Quality assurance requirements and techniques have been with us for some time. Even Hammurabi’s Code of Law from 1750 BCE has mention of several basic quality assurance programs. These include the corrections in the manufacturing process “If a builder build a house for some one, even though he has not yet completed it; if then the walls seem toppling, the builder must make the walls solid from his own means;” in the failure of a product “If a builder build a house for some one, and does not construct it properly, and the house which he built fall in and kill its owner, then that builder shall be put to death;” and an early type of product P ge 10212.1 Proceedings of the 2005 American Society for Engineering Education Annual Conference & Exposition Copyright © 2005, American Society for Engineering Education warranty in the building of a ship “If a shipbuilder build a boat for some one, and do not make it tight, if during that same year that boat is sent away and suffers injury, the shipbuilder shall take the boat apart and put it together tight at his own expense. The tight boat he shall give to the boat owner.” 2 Thus quality is closely related to reliability and safety, but they are not equivalent. For many organizations, quality is defined by the customer, and is related to value or return on expense. Reliability is the susceptibility of a product or process to failure. Safety is freedom from harm and is associated with a reasonable risk for a particular use. Even this must be further defined, as freedom from harm can be broken down into economic harm (an unsecured web site is hacked and credit card numbers are stolen and used) or physical harm (a product failure that causes injury or death to a user or bystander). Lastly, reasonable risk has both societal and legal definitions. As microcontroller driven products become commonplace in the consumer market, the definitions above become important to software engineers as well as well as hardware engineers. While software people are very familiar with testing, it could probably be stated that more system (hardware and software) thinking is necessary as we build and deploy more mechatronic systems. For example, until recently almost all automotive steering (like all flight control in the past) was a totally hardware system. Electronic power assist steering added sensors and actuators into the system along with a microcontroller. Currently available systems have a steerby-wire system for rear wheels while maintaining the steering linkage in the front. As we continue down this path, however, we will soon be at the day where weight and cost concerns push us to a total steer-by-wire and brake-by-wire situation. At that point software engineers must be as familiar with risk avoidance and mitigation techniques as their hardware colleagues. There are several techniques that allow product designers to try and build more quality, safety, and reliability into their products. A quick review includes Preliminary Hazard Analysis (PHA), Fault Tree Analysis (FTA), Reliability Block Diagrams (RBD), and Failure Mode and Effect Analysis (FMEA). While this set of techniques is not complete, it does include techniques that are both deductive (top-down) and inductive (bottom-up). As stated by Amberkar et al, “Deductive techniques focus on systematically identifying causes of undesirable effect, while inductive techniques focus on predicting effects of a priori known problems such as faults.” 3 Just as the combination of top-down and bottom-up approaches leads to better software design, a combination of deductive and inductive approaches to quality can be used to build more safety into products. A PHA (see Table 1) is needed at the beginning of any product development cycle to remind the design team that there are risks associated with the use of the product. Examples from the automotive world include “loss of steering” or “loss of braking.” An ex-student related to one of the authors about a particular fuel cell chemical reaction that he was working on that had a possibility of a “sudden and irreversible exothermic event.” A simple table containing only a few columns such as Event, Effect, Criticality, and Control reminds the development team, including the software engineers, that risk mitigation and management of these events is needed. Table 1 Generic Preliminary Hazard Analysis Event Effect Criticality Control Loss of ... Possible collision High Fault tolerant controller P ge 10212.2 Proceedings of the 2005 American Society for Engineering Education Annual Conference & Exposition Copyright © 2005, American Society for Engineering Education A RBD (see Figure 1) is similar to a Functional Block Diagram (FBD) in that it connects the components of a system. These components could be hardware (e.g. braking calipers) or hardware / software (the ABS controller) or totally software (the ABS program). “The basic steps for creating an RBD for a specific hazard are: 1. Determine the input starting point for the system and the flow of the system from the input to the output. 2. Starting from the input and working toward the output, identify the system components that could contribute to the specific hazard if they failed. 3. For each component, create a block in the RBD and place it in a position relative to its position in the input to output flow of the system. 3 The RBD can then be analyzed for what component or set of components can lead to an undesired event. While the RBD is more detailed than the PHA, it is still very high level, dealing with the block components of the system. Figure 1 Generic Reliability Block Diagram A FTA starts with the unwanted system event that may have been identified in the PHA or RBD above and works its way back to individual failures that cause the event. In other words, a “Fault tree analysis (FTA) is a top-down approach to failure analysis, starting with a potential undesirable event (accident) called a TOP event, and determining all the ways it can happen.” 4 The ways that it can happen, or the individual failures along the way, are connected with AND gates and OR gates to get to the final event. For example, Figure 2 shows how a system failure could occur by the failure of either the actuator or the actuator command. While product (and thus software) development focuses on FTA from a projective approach, it could also be used from a historical or diagnostic approach. A recent television show in the Crime Scene Investigation (CSI) series showed this can be done. In the episode, the investigation of a bus accident found that a tire failure caused by a contaminant being placed in the tire along with a bolt failure from a bad part caused the resultant loss of control and crash. Figure 2 Generic Fault Tree Analysis Failure Mode and Effects Analysis Unlike PHA, RBD, and FTA, FMEA is considered an inductive approach to failure analysis. Here the analysis is done from an individual component (or in the case of a Software FMEA, an individual variable), and following the problem as it influences the subsystem or system. The goal of an FMEA is to identify problems and rank them using a Risk Priority Number, or RPN. Actuator
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom