Talk:Reliability engineering

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Systems (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Systems, which collaborates on articles related to systems and systems science.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
Taskforce icon
This article is within the field of Systems engineering.
 
WikiProject Statistics (Rated C-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 
WikiProject Technology (Rated C-class)
WikiProject icon This article is within the scope of WikiProject Technology, a collaborative effort to improve the coverage of technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
Checklist icon
 

Missing definition of single point of contention[edit]

I was redirected to this article from single point of contention, but this article doesn't even mention the word "contention" once. AdamSpiers (talk) 10:02, 12 December 2013 (UTC)

See also section[edit]

In the see also section of this article, I have just added a link to product qualification even though such a page does not exist today. This is such a common term in both reliability engineering and quality engineering, that it would be very useful if someone could start to write it up. A Google search for "product qualification" just found 65,000 hits, so there's plenty to work from. DFH 19:59:33, 2005-09-01 (UTC)

Looks good for the customer if there are no failures[edit]

"Anyhow it looks good for the customer if there are no failures." This is a legitimate point, but it needs to be differently expressed. I can't come up with an alternative that doesn't use weasel words, so I've left it alone. Tom Harrison Talk 17:34, 27 February 2006 (UTC)

  • Something like this could be used: "No failures seems more reliable to ??? without detailed knowledge of ###".  ??? could be customers, people or others. ### could be statistics or mathematics. --Nordby73 11:11, 28 February 2006 (UTC)

Bayes Theorem[edit]

Bayes theorem is a very important tool in reliability engineering to reduce a complicated system into a simpler system or a black box. It should be mentioned in the article. -- HN.

Software reliability[edit]

Most software errors are "chaotic", in that the malfunctioning doesn't worsen with gradual changes to the input values or conditions of use, but instead, is either absent or completely fatal, and the latter only with very specific input values or conditions of use. Furthermore, the sequential way in which software is usually combined makes errors highly dependent: one failure or hangup somewhere may make the whole system fail or hang. So statistical reliability analysis doesn't seem to make much sense when applied to software errors in general. It does make sense when applied to performance or scalability issues, which are more likely to be relatively independent and more likely to have "gradual" behavior. ~~

(I am a layman on this subject, but I am a software developer, and I feel a paragraph of this kind should be included in the article. What do you think?) —The preceding unsigned comment was added by Rp (talkcontribs) 19:02, 28 February 2007 (UTC).

Hey, not only do I agree - whole BOOKS should be written on the subject of software reliability. In addition, Wikipedia lacks articles on network reliability. Hardware alone is not the sole cause of system failure, and we live in a systems engineer world where complexity reigns. Software and networks are now a major component of many complex systems. Please feel free to elaboraborate, but most importantly, add new articles on subject matter that is obviously lacking.--96.244.248.77 (talk) 02:58, 3 March 2011 (UTC)
Let me come back with some counter-argument though - a customer does not care what the root cause of a failure is. He/She only knows the system does not do what needs to be done to satisfy his/her mission. They don't care if it is systematic or chaotic or unrepeatable, gradual or immediate, hardware or software. So measures of reliability should be from the POV of customer satisfaction.--74.107.74.39 (talk) 01:00, 20 April 2011 (UTC)

Single Point of Failure[edit]

The term is part of the SPF disambiguation. When it's clicked on, we are redirected here. However, there is no mention of this term in the article. I know it to be a part of a system that, when it fails, also makes a substantial (I know, that's vague) part of the rest of the system fail. For example, a power supply in a home computer (a defect will render the entire computer unusable), or a network switch in a small network, where all computers/servers are hooked up to that single switch (which will render the whole network useless).

I am hardly a capable wikipedia editor, nor am I very familiar with reliability theory -- in fact, I came here to look up a clear explanation for non-IT people -- but wouldn't it makes sense to include at least a mention of this term in the article, or otherwise remove the redirect and make a tiny article/stub that includes a link to reliability theory for a broader explanation of the topic?

85.145.112.69 17:43, 19 September 2007 (UTC) Frederik

Incorrect usage:functional unit[edit]

The sentence: "The probability that a functional unit will perform .." redirects to execution unit, which from what I can gather pertains to CPU's. Depending on what is correct, either (i) the article on execution unit needs to change, or alternatively, (ii) functional unit needs its own article to support the broader notion of black-box-that-can-be-quantifiably-tested-for-reliability, or alteratively, (iii) a different-but-correct terminology beside "functional unit" must be used here instead. Vonkje (talk) 19:36, 2 April 2009 (UTC)

Salvaged copy[edit]

I found the following uncited copy in the Single point of failure article. I determined that it did not belong there. Perhaps it may be of use here. --Kvng (talk) 20:09, 20 July 2010 (UTC)

The strategy to prevent total system failure is

Reduced complexity
Complex systems shall be designed according to principles decomposing complexity to the required level.
Redundancy
Redundant systems include a double instance for any critical component with an automatic and robust switch or handle to turn control over to the other well functioning unit (failover)
Diversity
Diversity design is a special redundancy concept that calls for the doubling of functionality in completely different design setups of components to decrease the probability that redundant components might fail both at the same time under identical conditions.
Transparency
Whatever systems design will deliver, long term reliability is based on transparent and comprehensive documentation.
Incorrect usage
The term is incorrectly applied to a software development and support environment, in relation to staff. The correct term in this instance is Single point of resolution.

Defining "failure" vs. degradation[edit]

Defining failure seems to be an important factor in any reliability program. Yet there are many case examples where there is mission failure and simply degradation. For example, a single engine failure on a four-engine airplane (resulting in three operational engines) can (if designed that way) result only in degradation of the planes performance. It can still fly and land safely. Does this count as a failure? Similarly, loss of a router on the internet might result in degraded performance, but the internet remains operational. I suspect there are more case examples to ask about. Failure seems absolute to me, yet we live in a complex world, full of workarounds. --74.107.74.39 (talk) 02:44, 3 April 2011 (UTC)

I don't see the problem. Yes, failure of a plane engine counts as failure of the engine, not as failure of the plane; failure of an Internet router does not imply failure of the Internet. Your third example is unclear: what is the requirement exactly? Rp (talk) 20:21, 3 April 2011 (UTC)

I removed the third example since you are right, it is a bit confusing. Am not sure what my point was when I wrote it initially. I'm might have been thinking of certain lines of software code that never get executed (code coverage). However, back to the first two points - its about "how do you define failure?". There is one form of "mission failure" (functional - navigate and land a plane safely in one context, or deliver a message on the internet) and another form of "mission failure" (performance - get your aircraft or message to your destination on-time). So exactly how does one define "failure" is the distinction. Allow me to create a new third example - my car is in perfect condition. I must be able to get to and from work in less than 30 minutes in a normal daily commute. One day the "check engine" light comes on, yet everything seems to run normally. Is that really a failure when it has no mission impact? All I am asking is how one goes about defining a system failure. --74.107.74.39 (talk) 00:45, 20 April 2011 (UTC)

Reliability statistics[edit]

The page for Reliability (statistics) currently redirects to Reliability (psychometrics); This redirect does not make sense to me. Reliability engineering is a heavy user of statistical tools, many specific for the reliability function. I suggest that Reliability (statistics) redirect to this page on Reliability Engineering. An alternative might be to make an article of Reliability (statistics); this could give more discussion of the statistical tools used in the reliability sciences. Any comment? Rlsheehan (talk) 18:03, 25 November 2012 (UTC)

Oh gosh, you are correct. That must be addressed, now! Psychometrics and reliability engineering are, well, different. I will try to remediate the problem you identified, and do by implementing the solution you proposed, as it is reasonable. Thank you! --FeralOink (talk) 13:10, 2 June 2013 (UTC)

Certification and Reliability engineering education[edit]

I have, based on my professional experience in this area, checked the text under "Certification" and "Reliability engineering education" and found that it represents both European and International perspective. I tried to modify the text and can add some sentences here and there, but the current content is quite OK. I propose that the comments regarding “not representing a worldwide view of the subject” from June 2012 be removed. — Preceding unsigned comment added by Dependability (talkcontribs) 08:40, 22 March 2013 (UTC)

Insufficient in-line citations[edit]

I've tagged this article for not enough in-line citations; it's a great article, but only has four citations, but more than 40 references (if you include the standards as well). — Sasuke Sarutobi (talk) 01:02, 1 September 2013 (UTC)

Contribution by IP user 213.46.148.89[edit]

At 10:39 on 8 June 2013, IP user 213.46.148.89 added to the article this paragraph:

"Although Reliability is defined and affected by stochastic parameters, according to some acknowledged specialists, quality, reliability and safety are NOT achieved by mathematics and statistics. Nearly all teaching and literature on the subject emphasises these aspects, and ignores the reality that the ranges of uncertainty involved largely practically invalidate quantitative methods for prediction and measurement."

along with a citation to the O'Connor's 2002 book Practical Reliability Engineering.

Subsequent edits slightly improved the paragraph's spelling and style, but they have left the illogical sentence structure, misused citation, and rant-like tone so that it now reads:

"Although reliability is defined and affected by stochastic parameters, according to some acknowledged specialists, quality, reliability and safety are NOT achieved by mathematics and statistics. Nearly all teaching and literature on the subject emphasizes these aspects, and ignores the reality that the ranges of uncertainty involved largely invalidate quantitative methods for prediction and measurement."

Although I rather agree with the sentiment of the paragraph, I think it clearly needs to be rewritten and perhaps moved to a more appropriate place in the article. Before I make these changes, does anyone want to offer any suggestions? Who are the acknowledged specialists? A citation is needed. Is the complaint that the teaching and literature is devoted to the math and statistics but that these by themselves don't improve reliability (only better engineering does)? The O'Connor book hardly seems to offer support for the idea that quantitative analysis is invalid because of its uncertainty. I know some citations for this idea. Does anyone want to offer some better citations? Should the paragraph perhaps go into a section on limitations of reliability analysis? Scwarebang (talk) 00:35, 24 April 2014 (UTC)