|This article is of interest to the following WikiProjects:|
- 1 Missing definition of single point of contention
- 2 See also section
- 3 Looks good for the customer if there are no failures
- 4 Bayes Theorem
- 5 Software reliability
- 6 Single Point of Failure
- 7 Incorrect usage:functional unit
- 8 Salvaged copy
- 9 Defining "failure" vs. degradation
- 10 Reliability statistics
- 11 Certification and Reliability engineering education
- 12 Insufficient in-line citations
- 13 Contribution by IP user 184.108.40.206
- 14 Maybe fix Random capitalization?
Missing definition of single point of contention
See also section
In the see also section of this article, I have just added a link to product qualification even though such a page does not exist today. This is such a common term in both reliability engineering and quality engineering, that it would be very useful if someone could start to write it up. A Google search for "product qualification" just found 65,000 hits, so there's plenty to work from. DFH 19:59:33, 2005-09-01 (UTC)
Looks good for the customer if there are no failures
"Anyhow it looks good for the customer if there are no failures." This is a legitimate point, but it needs to be differently expressed. I can't come up with an alternative that doesn't use weasel words, so I've left it alone. Tom Harrison Talk 17:34, 27 February 2006 (UTC)
- Something like this could be used: "No failures seems more reliable to ??? without detailed knowledge of ###". ??? could be customers, people or others. ### could be statistics or mathematics. --Nordby73 11:11, 28 February 2006 (UTC)
Bayes theorem is a very important tool in reliability engineering to reduce a complicated system into a simpler system or a black box. It should be mentioned in the article. -- HN.
Most software errors are "chaotic", in that the malfunctioning doesn't worsen with gradual changes to the input values or conditions of use, but instead, is either absent or completely fatal, and the latter only with very specific input values or conditions of use. Furthermore, the sequential way in which software is usually combined makes errors highly dependent: one failure or hangup somewhere may make the whole system fail or hang. So statistical reliability analysis doesn't seem to make much sense when applied to software errors in general. It does make sense when applied to performance or scalability issues, which are more likely to be relatively independent and more likely to have "gradual" behavior. ~~
(I am a layman on this subject, but I am a software developer, and I feel a paragraph of this kind should be included in the article. What do you think?) —The preceding unsigned comment was added by Rp (talk • contribs) 19:02, 28 February 2007 (UTC).
- Hey, not only do I agree - whole BOOKS should be written on the subject of software reliability. In addition, Wikipedia lacks articles on network reliability. Hardware alone is not the sole cause of system failure, and we live in a systems engineer world where complexity reigns. Software and networks are now a major component of many complex systems. Please feel free to elaboraborate, but most importantly, add new articles on subject matter that is obviously lacking.--220.127.116.11 (talk) 02:58, 3 March 2011 (UTC)
- Let me come back with some counter-argument though - a customer does not care what the root cause of a failure is. He/She only knows the system does not do what needs to be done to satisfy his/her mission. They don't care if it is systematic or chaotic or unrepeatable, gradual or immediate, hardware or software. So measures of reliability should be from the POV of customer satisfaction.--18.104.22.168 (talk) 01:00, 20 April 2011 (UTC)
Single Point of Failure
The term is part of the SPF disambiguation. When it's clicked on, we are redirected here. However, there is no mention of this term in the article. I know it to be a part of a system that, when it fails, also makes a substantial (I know, that's vague) part of the rest of the system fail. For example, a power supply in a home computer (a defect will render the entire computer unusable), or a network switch in a small network, where all computers/servers are hooked up to that single switch (which will render the whole network useless).
I am hardly a capable wikipedia editor, nor am I very familiar with reliability theory -- in fact, I came here to look up a clear explanation for non-IT people -- but wouldn't it makes sense to include at least a mention of this term in the article, or otherwise remove the redirect and make a tiny article/stub that includes a link to reliability theory for a broader explanation of the topic?
22.214.171.124 17:43, 19 September 2007 (UTC) Frederik
Incorrect usage:functional unit
The sentence: "The probability that a functional unit will perform .." redirects to execution unit, which from what I can gather pertains to CPU's. Depending on what is correct, either (i) the article on execution unit needs to change, or alternatively, (ii) functional unit needs its own article to support the broader notion of black-box-that-can-be-quantifiably-tested-for-reliability, or alteratively, (iii) a different-but-correct terminology beside "functional unit" must be used here instead. Vonkje (talk) 19:36, 2 April 2009 (UTC)
The strategy to prevent total system failure is
- Reduced complexity
- Complex systems shall be designed according to principles decomposing complexity to the required level.
- Redundant systems include a double instance for any critical component with an automatic and robust switch or handle to turn control over to the other well functioning unit (failover)
- Diversity design is a special redundancy concept that calls for the doubling of functionality in completely different design setups of components to decrease the probability that redundant components might fail both at the same time under identical conditions.
- Whatever systems design will deliver, long term reliability is based on transparent and comprehensive documentation.
- Incorrect usage
- The term is incorrectly applied to a software development and support environment, in relation to staff. The correct term in this instance is Single point of resolution.
Defining "failure" vs. degradation
Defining failure seems to be an important factor in any reliability program. Yet there are many case examples where there is mission failure and simply degradation. For example, a single engine failure on a four-engine airplane (resulting in three operational engines) can (if designed that way) result only in degradation of the planes performance. It can still fly and land safely. Does this count as a failure? Similarly, loss of a router on the internet might result in degraded performance, but the internet remains operational. I suspect there are more case examples to ask about. Failure seems absolute to me, yet we live in a complex world, full of workarounds. --126.96.36.199 (talk) 02:44, 3 April 2011 (UTC)
- I don't see the problem. Yes, failure of a plane engine counts as failure of the engine, not as failure of the plane; failure of an Internet router does not imply failure of the Internet. Your third example is unclear: what is the requirement exactly? Rp (talk) 20:21, 3 April 2011 (UTC)
I removed the third example since you are right, it is a bit confusing. Am not sure what my point was when I wrote it initially. I'm might have been thinking of certain lines of software code that never get executed (code coverage). However, back to the first two points - its about "how do you define failure?". There is one form of "mission failure" (functional - navigate and land a plane safely in one context, or deliver a message on the internet) and another form of "mission failure" (performance - get your aircraft or message to your destination on-time). So exactly how does one define "failure" is the distinction. Allow me to create a new third example - my car is in perfect condition. I must be able to get to and from work in less than 30 minutes in a normal daily commute. One day the "check engine" light comes on, yet everything seems to run normally. Is that really a failure when it has no mission impact? All I am asking is how one goes about defining a system failure. --188.8.131.52 (talk) 00:45, 20 April 2011 (UTC)
The page for Reliability (statistics) currently redirects to Reliability (psychometrics); This redirect does not make sense to me. Reliability engineering is a heavy user of statistical tools, many specific for the reliability function. I suggest that Reliability (statistics) redirect to this page on Reliability Engineering. An alternative might be to make an article of Reliability (statistics); this could give more discussion of the statistical tools used in the reliability sciences. Any comment? Rlsheehan (talk) 18:03, 25 November 2012 (UTC)
- Oh gosh, you are correct. That must be addressed, now! Psychometrics and reliability engineering are, well, different. I will try to remediate the problem you identified, and do by implementing the solution you proposed, as it is reasonable. Thank you! --FeralOink (talk) 13:10, 2 June 2013 (UTC)
Certification and Reliability engineering education
I have, based on my professional experience in this area, checked the text under "Certification" and "Reliability engineering education" and found that it represents both European and International perspective. I tried to modify the text and can add some sentences here and there, but the current content is quite OK. I propose that the comments regarding “not representing a worldwide view of the subject” from June 2012 be removed. — Preceding unsigned comment added by Dependability (talk • contribs) 08:40, 22 March 2013 (UTC)
Insufficient in-line citations
I've tagged this article for not enough in-line citations; it's a great article, but only has four citations, but more than 40 references (if you include the standards as well). — Sasuke Sarutobi (talk) 01:02, 1 September 2013 (UTC)
- I've tagged the article again, now for at least the third time. 13 in-line citations is a good start, but for an article of its length needs considerably more; I'd expect in the region of about 100. The balance towards the 'further reading' section is far too heavy. — Sasuke Sarutobi (talk) 09:42, 30 October 2014 (UTC)
Contribution by IP user 184.108.40.206
At 10:39 on 8 June 2013, IP user 220.127.116.11 added to the article this paragraph:
- "Although Reliability is defined and affected by stochastic parameters, according to some acknowledged specialists, quality, reliability and safety are NOT achieved by mathematics and statistics. Nearly all teaching and literature on the subject emphasises these aspects, and ignores the reality that the ranges of uncertainty involved largely practically invalidate quantitative methods for prediction and measurement."
along with a citation to the O'Connor's 2002 book Practical Reliability Engineering.
Subsequent edits slightly improved the paragraph's spelling and style, but they have left the illogical sentence structure, misused citation, and rant-like tone so that it now reads:
- "Although reliability is defined and affected by stochastic parameters, according to some acknowledged specialists, quality, reliability and safety are NOT achieved by mathematics and statistics. Nearly all teaching and literature on the subject emphasizes these aspects, and ignores the reality that the ranges of uncertainty involved largely invalidate quantitative methods for prediction and measurement."
Although I rather agree with the sentiment of the paragraph, I think it clearly needs to be rewritten and perhaps moved to a more appropriate place in the article. Before I make these changes, does anyone want to offer any suggestions? Who are the acknowledged specialists? A citation is needed. Is the complaint that the teaching and literature is devoted to the math and statistics but that these by themselves don't improve reliability (only better engineering does)? The O'Connor book hardly seems to offer support for the idea that quantitative analysis is invalid because of its uncertainty. I know some citations for this idea. Does anyone want to offer some better citations? Should the paragraph perhaps go into a section on limitations of reliability analysis? Scwarebang (talk) 00:35, 24 April 2014 (UTC)
Maybe fix Random capitalization?
The Capitalization is random, or at least Way, Way non-modern-English in its Pattern. Perhaps an Energetic Person could fix this. 18.104.22.168 (talk) — Preceding undated comment added 18:59, 8 April 2015 (UTC)