Talk:S.M.A.R.T.
| This is the talk page for discussing improvements to the S.M.A.R.T. article. | |||
|---|---|---|---|
|
|
|
|
| WikiProject Computing | (Rated Start-class, Low-importance) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||
[edit] Reallocated Sectors Count - mess
Whoever wrote this has mixed in a bunch of out-of-context explanations which together are non-sensical or contradictory. For instance - the drive itself has no concept of partitions. Or, "failure of boot sector" - obviously this makes no difference after it's reallocated - but antique info from the days of floppy disks has crept into the explanation... —Preceding unsigned comment added by 203.45.103.88 (talk) 23:16, 9 February 2011 (UTC)
[edit] SMART Attributes List
Some descriptions of the SMART attributes are clearly incorrect. "Load" refers to the operation and number of times the heads move from parked to unparked, not when the drive is seeking. GMR head amplitude refers to the signal from the read head, not any movement. "Read Channel Margin" description is content-free.
AAM and APM should be listed as follows:
AAM = Automatic Acoustic Management, APM = Automatic Power Management
[edit] Spin Retry Count
This description does not appear to correspond to actual data values in Western Digital and Seagate drives (recent models). Seagate posts 100-100-97-0-OK with HDTune marking it yellow (warning) bar, and WD posts 100-100-51-0-OK with HDTune with no marking color. (Values correspond to Current-Worst-Threshold-Data-Status). With respect to these two manufacturers, the description makes no sense. —The preceding unsigned comment was added by FUBARinSFO (talk • contribs) 00:39, 2 May 2007 (UTC).
[edit] Reallocated sectors
Please make it easier for me to **use** this information. Please enhance the Table entries.
Could we get practical and say, This is a down-counter, and, when zero, there is no way to deal with additional sectors whose read errors are too severe to be fixed with Error Correction Codes.
In the discussion above the Table, tell me, if there is no more space to absorb a sector needing reallocation, does my drive now pass errors up to the opsys file system, which reports read errors and/or other file unavailability?
In the table, you can make room by erasing: "the more sectors that are reallocated, the more read/write speed will decrease". This is true. However, it hardly matters to a user who is suffering data loss, possible data corruption, and potential boot failure, which blocks access to everything.
My SMARTCTRL under WinXP or Knoppix shows
Reallocated_Sector_Ct
VALUE 1
WORST 1
THRESHOLD 63 (Please confirm that a number less than 63 is bad -- but which number?)
This "pre-failure" category of parameter is UPDATED "always".
WHEN_FAILED is "FAILING_NOW". How can I tell? Because VALUE 1 is less than THRESHOLD 63?
The "RA" column (I do not know what this is. Do you? RAW counts?) is 12. 12 is not 1 and 12 is not 63. I wonder what 12 is.
This page has not yet evolved into a pratical guide and it still lacks an accessible exposition of the topic's salient points. Nevertheless, and IMHO, it is already far ahead of most pages and posts on the Internet. So let's not stop now ! Jerry-va 01:27, 29 May 2006 (UTC)jerry-va
- The final column is RAW_VALUE, and 12 for the Reallocated_Sector_Ct attribute means that 12 bad sectors have been remapped. You should replace this disk soon, as that number will only rise, and the higher it gets, the more data you're going to lose. --Error28 12:03, 4 September 2006 (UTC)
It comes down to money. If you have reallocated sectors you should replace your disk. In my experience you can go much longer with reallocated sectors on desktop drives, once reallocated sectors show up in laptops they increase fast.Josiah 20:02, 12 October 2007 (UTC)
"This is why, on modern hard disks, "bad blocks" cannot be found while testing the surface — all bad blocks are hidden in reallocated sectors. " I don't think that's quite accurate, based on my understanding. When you write to a bad sector, sure, it gets silently reallocated. If you read the sector and the data is bad but corrected by ECC, the drive should correct it and copy it to a reallocated sector. But if the data is uncorrectable, an error must be returned, since it would be unacceptable to return bogus data. Moreover, it must continue to give an error on a future read, until the sector is rewritten. So you can sometimes find bad sectors by reading the entire disk. "Testing the surface" is confusingly vague. 76.254.84.64 07:30, 31 October 2007 (UTC)
I find this table entry confusing as well -- It says in the table "A decrease in the attribute value indicates bad sectors", *but* the 'Better' column indicates that a decrease in this number is a *good* thing?? It seems like the arrow should be changed for this table entry. If this is really an indicator of the number of sectors potentially available for reallocation (in the event of a bad sector being detected), then it would make sense that a higher number is better, a decrease is bad. When using applications such as HD Tune, under the 'Health' tab it tells me that my particular drive has 100 as the current count for this field, and that 36 is the threshold. It does not seem to see any problem with this and it is telling me that it is OK -- so it seems to coincide... ChrisTracy (talk) 18:09, 15 May 2008 (UTC)
[edit] Temperature sensor
the section on temperature and temperature sensors is opiniated and somewhat incorrect / not up to date. all the hard drives from 1998+ include a temperature sensor. the reason: all modern hard drives use GMR heads (Giant Magneto Resistive heads), which requires very accurate temperature measurement to be able to read the data back (the difference between a 0 and a 1 readback is about the same order of magnitude as a 0.1 degrees Celsius change in the GMR head).
also, the temperature failure mode is not necessarily cumulative.
- my samsung 1999 drive (8 gb) may or may not have a T sensor, in any case it does not report about it in SMART. --145.253.2.236 (talk) 12:57, 12 July 2008 (UTC)
[edit] Curious sentence
<partsunkn> SMART is a system used to kill the drive when the warranty is up —Clarknova 03:35, 28 February 2006 (UTC)
Removed the following curious sentence from "working modality".
- Manufacturing companies which claim to support S.M.A.R.T. but withhold specific sensor information on individual products include Seagate, [...]
[...], indeed! What the frip. - 194.89.3.244 17:56, 28 February 2006 (UTC)
-
- But it's true. They specifically withhold the information. Do the research.
[edit] Read Error Rate description incorrect
Elsewhere I have read that a high value for Read Error Rate is good, and the attribute value decreases as read error rate increase.
- That is simply incorrect. It's saying that the more errors the better. That's nonsense. —Preceding unsigned comment added by 90.5.11.225 (talk) 22:42, 1 November 2008 (UTC)
Consistent with this, the two SMART monitoring tools I have used alert the user when the Read Error Rate attribute value falls below a threshold.
This description deserves accuracy and careful explanation perhaps more than any other, since this attribute is so critical.
-- I think it means 'time between read errors'; the smaller the number, the higher the rate, but whether it's seconds, hours, or fortnights, I couldn't begin to guess.
-
-
- Perhaps a more logical definition would be 'no. of succesful reads between errors' ? --217.173.195.210 09:23, 14 August 2007 (UTC)
-
I agree that the description is incorrect. The higher the value the better. This isn't a "rate" per se - it should be regarded more as a score. All values are a max of 255 - most manufacturers see this as a 'percent good' - a value out of 100. —Preceding unsigned comment added by 64.7.157.226 (talk) 21:13, 14 October 2009 (UTC)
[edit] Frustration with SMART
I'd like it if the table spelled out what the "good" and "bad" values of the attributes are.
- The general rule is that higher is better than lower, except in the case of temperature. The specific thresholds of "OK" and "failing" are up to the manufacturers to specify. Most of the numbers involved are arbitrary and defined separately by the manufacturers. GreenReaper 16:07, 24 August 2006 (UTC)
-
- This is definitely not true. Load cycles and such? The higher the better? If that's the case then a drive with a load count of 600,000+ that just failed for good is the healthiest drive you can have. —Preceding unsigned comment added by 90.5.11.225 (talk) 22:44, 1 November 2008 (UTC)
[edit] More frustration...
I also feel that nobody tells you some useful (average, maybe IQ 100, no IT degree) human-readable information about your harddisk. Something like "your harddisk /dev/hda is 2.8 years old the probability that it will survive this month is 98% (suggested replacement value: 96%; suggested backup value: 99%)". But I'm confused by 1000 different values. How bad are they really? Where should the values be (see comment above)? It does not really help to make a business decision to replace or not to replace the drive. Can someone please shed some light into this? Can smartmontools developer please think of the CTO's business decision of replacing or not replacing a disk? And some useful information for the home user. THANKS -- Michael Janich 09:15, 31 July 2006 (UTC)
- Most hard disk fail within the first two years, if it doesn't fail the within those years, it is a good idea to keep the hard disk for another 3 years.
If you can plot the failure rate of hard disk, it starts off very high, then it goes to its lowest point around two years, and then slowly climbs back up to the rate at which it started. Hope that helps Hqduong 08:10, 5 December 2006 (UTC)
-
- I question that. You are saying that most HDDs fail within two years? That's scary. What's worse: it's simply nonsense. Perhaps those that fail will most often fail within the first two years; but that is definitely not what you wrote.
- Google published a study on hard disks that claimed (based on memory, not citation) that 1) only half the disks that failed had something significant in their SMART readouts, and 2) only half the disks with something significant in their SMART readings actually failed. So after all the hoopla, it may not be that useful after all. --Alvestrand 07:39, 19 March 2007 (UTC)
- Disraeli had something insightful to say about this.
- Hard Disk Sentinel software can display information in an understandable way. It gives a textual description about the hard disks, displays the real number of problems found so you can have some ideas about the real status instead of displaying just some numbers/values. Because thresholds + value pairs and T.E.C. dates are not really able to predict hard disk failures, this software uses a completely different method to detect and display real hard disk problems found on IDE/SATA/USB/SCSI hard disks. Works under Windows, DOS, Linux. —Preceding unsigned comment added by 87.229.50.242 (talk) 08:12, 4 June 2008 (UTC)
[edit] SMART and RAID
Any idea if SMART can still be used on HDD's included in a RAID array? --Evolve2k 05:06, 7 January 2007 (UTC)
-
- I have seen some motherboards with hardware RAID support/PCI RAID expansion cards that have a BIOS/firmware capable of retrieving and displaying SMART data. No idea if there's anything out there that lets you do this in software though. SMART is a very mysterious technology IMO. --86.138.51.21 08:20, 26 January 2007 (UTC)
-
- I'm building a RAID array with four Seagate ST3320620AS (7200.10 320GB) drives in it. Once I get the second pair of drives I'll let you guys know. Using NVIDIA MediaShield on a P5N32-E SLI Plus. I can also confirm that BE is definitely a temperature sensor on that drive, btw. 66.146.62.42 22:23, 10 May 2007 (UTC)
[edit] Merging in Threshold Exceeds Condition
Since the mergeto tag of the Threshold Exceeds Condition article says to discuss the subject here:
- Merge. My opinion. --Alvestrand 22:16, 13 January 2007 (UTC)
[edit] Background
According to the cited google study, SMART can predict about 40-60% of all drive failures, depending on the monitored attributes. The stated 30% taken from some FAQ might be too pessimistic here.
—The preceding unsigned comment was added by Michi cc (talk • contribs) 17:38, 21 April 2007 (UTC).
[edit] Attribute list is confusing
Some of the arrows in the attribute list don't appear to be correct. "Power On Hours" is marked with an up arrow--I would think that a *lower* number of operating hours would be considered better, not a higher one. Same thing with calibration retries. It's also not clear in many of the descriptions whether the values being referred to are the raw values, normalized values, worst values, threshold values, or something else, making the table even more unintelligible to someone unfamiliar to SMART. All of this should be made much more clear. ::Travis Evans 11:39, 16 June 2007 (UTC)
- As of today, these issues now appear to be largely improved. ::Travis Evans (talk) 14:38, 16 December 2007 (UTC)
Set Load/Unload cycle count with a down arrow - as when the head unloads/reloads it creates wear on the servo and the read/write head has a possibility of failiure TO load if it isn't loaded or unloaded completely —Preceding unsigned comment added by 64.228.219.208 (talk) 03:59, 11 October 2007 (UTC)
[edit] Contradictory statement about higher vs lower
Note that the attribute values are always mapped to the range of 1 to 253 in a way that means higher values are better.
This is then followed by a chart which describes whether it is better to have lower or higher values, seeming to contradict the above sentence. Can someone please clarify or correct? Ham Pastrami (talk) 19:20, 24 November 2007 (UTC)
- I believe that the reason for this apparent conflict is that the chart refers to the “raw” attribute values rather than the “normalized” ones. For normalized values, the statement in the article that higher numbers are “always” better is almost correct (I'll explain why I say “almost” in a moment), but the raw values can follow any rule that the drive manufacturer wants.
- The biggest problem with the article, I think, is that it doesn't explain clearly enough that there are actually several different values involved for each attribute. The chart is totally unclear about it. The chart is also problematic because some of the attributes the chart describes appear to function in a totally different (even the exact opposite) manner on certain drives.
- The statement “...The attribute values are always mapped ...in a way...that higher values are better” also isn't true in the strictest since, because I know of some drives (such as mine) which actually indicate the normalized temperature value directly in Celsius (e.g., a value of 40 means 40°C), which means that for this attribute, higher values are actually worse. This is likely a rare exception, though.
- I may attempt to greatly clarify the article myself some time if I get a chance, but if anyone else wants to do it right now instead, feel free to go ahead and do so. ::Travis Evans (talk) 21:58, 5 December 2007 (UTC)
-
- Okay, I just edited the Attributes section in the hope that it will now make much more sense. ::Travis Evans (talk) 14:35, 16 December 2007 (UTC)
[edit] More info on Selftest specifications please
A good article. You can get SMART drives to initate either short or long self test (managed by the drive itself). But what exactly does the SMART specification reqiure a drive to do during these tests? Robin April 2008
[edit] Critical Attributes
The study at Google described in the Background section found four parameters strongly correlated with drive failure. The later table describes the SMART attributes, but even after reading the paper I don't see which attributes correspond to those four critical parameters. Here are the possibilities I see:
| name in paper | SMART attribute # | SMART attribute name |
|---|---|---|
| scan errors | 1 | Read Error Rate |
| 187 | Reported Uncorrectable Errors | |
| 201 | Soft Read Error Rate | |
| 250 | Read Error Retry Rate | |
| reallocations | 5 | Reallocated Sectors Count |
| offline reallocations | 198 | Offline Uncorrectable Sector Count |
| probational counts | 197 | Current Pending Sector Count |
(I note that smartctl calls attribute 198 "Offline_Uncorrectable"[1][2].)
The discussion of the study and/or the table entries should be revised so the correspondence is clear.
Also, the table has six attributes highlighted as "critical". What's the justification for those six, as opposed to the four parameters noted in the Google study? Jrvz (talk) 14:08, 8 July 2008 (UTC)
- Um, since your Q is what the Google authors used, you should probably mail them. Unfortunately, the attributes don't even have the same meaning across disks, e.g. Seagate disks report a large value for raw read errors. Scan errors might be Attribute 7, Seek errors (this is probably related to sector/track not found problems and if nonzero, basically means the disk is mechanically dying); "Read Error Rate" is probably the number of problems found reading a sector, but in most harddisks, this should be named Serious Read Error, and it's pretty cumulative, not a rate. Minor read errors are normal and auto-corrected, some disks tell you their number, like my Atlas reports 80k read errors every boot to smartctl. Attributes 201 and 250 aren't even remotely standard. -- "Offline" reallocations.. well, it's in the smartctl man page you quoted, it's probably a sector that is dead but has not yet been remapped. Should be #198, yes. -- "Probational Counts", I am really guessing here, but it sounds either like the minor read errors or how often a sector was classified as "not so good". The number of write problems are counted in High_Fly_Writes and in Multi_Zone_Error_Rate, IIRC (don't ask me what multi zones got to do with that, and no, it's not a rate either I think). --88.74.187.45 (talk) 10:14, 24 January 2009 (UTC)
---
What is attribute 188?
smartd[2647]: Device: /dev/sda, SMART Usage Attribute: 188 Unknown_Attribute changed from 96 to 100 —Preceding unsigned comment added by 76.119.201.189 (talk) 03:44, 10 July 2008 (UTC)
The "Scan Error" is not a simple "read" error, but a "reallocation sector" error, because they talk of "first reallocation". Furthermore, his raw value is normally zero. I think is #187 (Uncorrectable Sector Errors). According to Google report, Scan Error is the most critical value! Why don't you highlight it in the reference table? --93.148.74.16 (talk) 09:30, 16 February 2011 (UTC)
[edit] Isn't "007 Seek_error_rate" critical?
The description seems to indicate it is, but it's not highlighted. (Love the highlighting btw, I wish smartmontools did it too). --82.134.28.194 (talk) 08:24, 30 July 2008 (UTC)
[edit] Panterasoft HDD Health not 100% freeware (non-commercial only)
The main page on panterasoft claims HDD Health is freeware, but if you install it and open the help->about it explains it's only free for non-commercial use. There's a help option that takes you to:
http://www.panterasoft.com/orderlnk.php?no=25
Which redirects to a web store selling commercial-use licenses for $29.95
Feel free to verify. —Preceding unsigned comment added by 71.248.110.143 (talk) 14:58, 1 October 2008 (UTC)
[edit] Split: Comparison of S.M.A.R.T. tools
This section is getting fairly big and would be easier to maintain away from the main article. If nobody disagrees I will go ahead with it. --Hm2k (talk) 00:17, 30 October 2008 (UTC)
- This has now taken place. Comparison of S.M.A.R.T. tools --Hm2k (talk) 16:22, 30 October 2008 (UTC)
[edit] Requested move
- The following discussion is an archived discussion of the proposal. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.
The result of the proposal was Move Parsecboy (talk) 14:57, 7 November 2008 (UTC)
As per WP:NC, "Titles should be brief without being ambiguous". This technology is almost always referred to as S.M.A.R.T. and almost never as "Self-Monitoring, Analysis, and Reporting Technology". This move would lead to a shorter, more manageable title and less confusion for readers. --Hm2k (talk) 15:19, 30 October 2008 (UTC)
- Support the expanded form is not known to users. 70.55.86.100 (talk) 08:33, 5 November 2008 (UTC)
- The above discussion is preserved as an archive of the proposal. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.
[edit] Start_stop attribute
My Samsung Spinpoint drive (from 1999 or 2000) counts starts AND stops in this attribute. start +=1, stop +=1, simply losing power += 0. This means this attribute is basically useless to tell you something about the usage history of the drive. A SV with Powercycle = 1000 and Start_stop = 1902 (raw) could have been subjected to a lot of spinup and spindowns (maybe from powersaving), or it could simply have had a careful owner who always parked it with the "poweroff" command. I wonder how other harddisks handle this. --92.78.30.160 (talk) 19:56, 19 January 2009 (UTC)
[edit] SMART predicts 64% of failures?
The reference is this page, but where is that data from? Is it reputable? Also that page says 30%. Family Guy Guy (talk) 15:49, 30 March 2009 (UTC)
[edit] 197 C5 Current Pending Sector Count
197 C5 Current Pending Sector Count
- Number of "unstable" sectors (waiting to be remapped). If the unstable sector is subsequently written or read successfully, this value is decreased and the sector is not remapped. Read errors on the sector will not remap the sector, it will only be remapped on a failed write attempt. This can be problematic to test because cached writes will not remap the sector, only direct I/O writes to the disk.
I am going to take out that last sentence, because I think it is wrong, and this is now getting quoted from this article, all over the internet.
I think the sentence is half-right. Only direct I/O writes let you know what happens. With a cached write, I am guessing that it will still potentially try to reallocate a sector on the "waiting" list, but that will happen after the cached write is initiated. If the remapping fails, the write will eventually fail, and the computer will get an error message -- but in some cases, the computer will have already assumed the write was OK.
In other words, with direct I/O you know immediately if there is a problem (as soon as you get a normal "completed" signal). With a cached write, you can't know if the proper remapping happened until after a delay, or after a "sync" or flush of the write cache.
I also question the use of the word "failed" in the previous sentence: "it will only be remapped on a failed write attempt". I question whether all drives would actually try to get a good write to a sector on the "pending" list, or just assume it is bad, and try to reallocate.
I'm just making all this up, based on a general understanding of computers. Someone who knows, and can find a good reference, should make the article more complete and accurate. -96.233.30.237 (talk) 23:02, 9 July 2009 (UTC)
[edit] Seagate atraw Seek Error Ratetribute
I believe that Seagate's raw Seek Error Rate attribute stores the number of seek errors in the uppermost 16 bits, and the number of seeks in the lower 32 bits.
A drive begins life with a cooked value of 253 until it accumulates enough seeks for the data to be statistically significant, after which the cooked value starts off at 60. The cooked value then increases or decrease as errors appear.
The normalised attribute appears to follow a logarithmic pattern:
90% = < 1 error per 1000 million seeks
80% = < 1 error per 100 million
70% = < 1 error per 10 million
60% = < 1 error per million
50% = 10 errors per million
40% = 100 errors per million
30% = 1000 errors per million
I don't have any official confirmation for the above information. It is the result of my analyses of numerous SMART reports.
I have performed several tests in support of my hypothesis. These are described in Google's Usenet archives. I don't know whether they can be considered for inclusion in this article, possibly as references. They are certainly not authoritative, but I believe they will withstand scrutiny.
121.44.138.74 (talk) 07:18, 21 August 2009 (UTC)
After watching my dying ST31000340AS Barracuda 7200.11 drive, I believe that the 6 bytes long field, Seagate's Raw Seek Error Rate Attribute stores the number of seek errors in the lower [0:23] bits, and the number of seeks in the uppermost [24:47] bits, both values in Big Endian. But I could not compute meaningful 'normalized' values, similar to previous poster's values. My seek error number was really high, maybe it never gets reset?
184.99.101.172 (talk) 03:17, 15 June 2011 (UTC)
[edit] Raw Read Error Rate
This Seagate forum thread discusses the meaning and behaviour of Seagate's Raw Read Error Rate attribute:
http://forums.seagate.com/stx/board/message?board.id=ata_drives&message.id=8700
It may help dispel fears about the relatively high numbers displayed by Seagate SMART reports. The RRER numbers actually reflect sector counts, not error counts.
I have verified that my Fujitsu drive interprets this attribute in a similar manner, except that its numbers are much smaller. —Preceding unsigned comment added by 121.44.138.74 (talk) 07:52, 21 August 2009 (UTC) (It's amazing how far people will take spite against another manufacturer. Just so you folks know the only other drive competing with seagates are velociraptors,which are faster,but the difference is negligible.Extremely reliable drives with a rarely low failure rate,statistically irrefutable. —Preceding unsigned comment added by 24.17.18.249 (talk) 11:45, 20 April 2010 (UTC)
[edit] Seek Error Rate - thermal widening
The article refers to seek errors being the result of "thermal widening" of the platters. My understanding is that this was only an issue for stepper motor drives, or voice coil drives that had a separate servo surface. Modern drives use embedded servos, so the positioner will always be able to find a track, no matter how much the platter expands or contracts.
121.44.138.74 (talk) 08:08, 21 August 2009 (UTC)
[edit] Attribute 240
The product manual for Fujitsu MHY2xxxBH series drives identifies attribute 240 as "Transfer Error Rate":
"*If the device receives the reset during transferring the data, the transfer error is counted up."
http://www.msc-ge.com/download/itmain/datasheets/fujitsu/MHY2xxxBH.pdf 121.44.98.124 (talk) 09:19, 22 September 2009 (UTC)
- Thank you! Reliable information about SMART values is extremely hard to come by, this is helpful. -- intgr [talk] 02:50, 20 January 2010 (UTC)
[edit] Attributes 185 & 186 (WDC)
The following SMART attribute names were extracted from WDC's wdtler.exe. There are two attributes (185 & 186) that do not appear in the Wikipedia list.
WDTLER 1.03 Copyright (C) 2004-2006 Western Digital Corporation Western Digital Time Limit Error Recovery Utility
http://zacuke.com/files/wdtler.ZIP
Raw Read Error Rate
Throughput Performance
Spin Up Time
Start/Stop Count
Re-allocated Sector Count
Read Channel Margin
Seek Error Rate
Seek Time Performance
Power-On Hours Count
Spin Retry Count
Drive Calibration Retry Count
Drive Power Cycle Count
Soft Read Error Rate
End to End Error Count
Head Stability - attribute 185 ?
Induced Op-Vibration Detection - attribute 186 ?
Reported Uncorrectable Errors
Command Time Out
High Fly Writes
Airflow Temperature
G-Sense Error Rate
Emergency Retract Count
Load/Unload Count
HDA Temperature
Hardware ECC Recovered
Relocation Event Count
Current Pending Sector Count
Offline Uncorrectable Sector Count
UltraDMA CRC Error Rate
Multi Zone Error Rate
Soft Read Error Rate
Data Address Mark Errors
Run Out Cancel
Soft ECC Correction
Thermal Asperity Rate
Flying Height
Spin High Current
Spin Buzz
Offline Seek Performance
Disk Shift
G-Sense Error Rate
Loaded hours
Load/Unload Retry Count
Load Friction
Load/Unload Cycle Count
Load-in Time
Torque Amplification Count
Power-Off Retract Count
GMR Head Amp
Temperature
Head Flying Hours
Read Error Retry Rate
121.44.85.40 (talk) 21:33, 2 October 2009 (UTC)
[edit] SMART Transmitters
I'd like to know whether SMART Transmitters used mainly in Oil and Gas plants can be categorized under this technology? I tried lately to search the internet for the meaning by SMART in such transmitters. These transmitters usually support the HART , FOUNDATION. ™ fieldbus, Modbus, and/or Profibus protocols.--Email4mobile (talk) 15:14, 18 October 2009 (UTC)
- SMART's protocols appear ATA & SCSI. However, the general idea is the same. Back in the 80s, I claimed no one would have a PC (that was silly): more useful was firmware to examine & control devices, and a workstation to analyze the data; or both firmware controls & processor embedded in a clothes iron, for example, to tell you when it is fully heated & warn you when it tips over; or an automobile, so mechanics could plug in a computer & analyze the problems. My medical thermometer beeps when it's fully heated. SMART gives us the 'raw' data from specific sensors on ATA & SCSI devices (another technology lets us adjust their speed): but where is the sophisticated analysis software? Also, I had no idea it worked on flashdrives; does it work on ATA optical drive? Geologist (talk) 21:14, 10 November 2009 (UTC)
[edit] More WDC SMART attributes
The following attributes were extracted from WD's wdidle3.exe utility, after unpacking it with UPX.
http://www.synology.com/support/faq_images/enu/wdidle3.zip
Raw Read Error Rate
Throughput Performance
Spin Up Time
Start/Stop Count
Re-allocated Sector Count
Read Channel Margin
Seek Error Rate
Seek Time Performance
Power-On Hours Count
Spin Retry Count
Drive Calibration Retry Count
Drive Power Cycle Count
Soft Read Error Rate
SATA Downshift Error Count
End to End Error Det/Corr Count
Head Stability
Induced Op-Vibration Detection
Reported Uncorrectable Errors
Command Time Out
High Fly Writes
Airflow Temperature
Shock Sense
Emergency Retract Cycle Count
Load/Unload Cycle Count
HDA Temperature
ECC on the Fly Count
Re-allocated Sector Event
Current Pending Sector Count
Offline Uncorrectable Sector Count
UltraDMA CRC Error Rate
Multi Zone Error Rate
Soft Read Error Rate
Data Address Mark Errors
Run Out Cancel
Soft ECC Correction
Thermal Asperity Rate
Flying Height
Spin High Current
Spin Buzz
Offline Seek Performance
Disk Shift
G-Sense Error Rate
Loaded hours
Load/Unload Retry Count
Load Friction
Load/Unload Cycle Count
Load-in Time
Torque Amplification Count
Power-Off Retract Count
GMR Head Amp
Temperature
Head Flying Hours
Total LBAs written
Total LBAs read
Read Error Retry Rate
Free Fall Sensor
121.44.79.8 (talk) 22:35, 22 October 2009 (UTC)
[edit] Implementations & Analysis
The above is a lot of information; but the value of each is just a raw datum. My Linux implementation doesn't log the rates of change in values, and it evaluates the implication of each individually: 'old', 'may fail soon'. Scientists would perform a cluster analysis of the above values, their rates, & their accelerations with time (using 'rcs', for example). Then they would examine the history of each cluster and label them 'dropped laptop', 'defective from factory', 'very old', 'bad RAM', &c.
Instead of 'may fail soon', I'm surprised we don't have both software & analyses by companies to draw upon, allowing much more informative messages. (What are companies doing with all the above data? They wouldn't collect it if they weren't using it.) I see no reason why companies haven't done this, and administrators of large servers & company LANs don't calculate the % chance of (or mean time to) failure described nicely above.
Also, do LAN administrators write scripts that collect these data during LAN backups? How do they decide when to replace a disk? Are there papers? Are such studies as the above proprietary?
This is a fine article, but these are some of the questions that people who look it up are probably seeking to answer. Geologist (talk) 20:51, 10 November 2009 (UTC)
[edit] SSD & CompactFlash SMART attributes
http://www.hsgi.jp/documents/Delkin-Solid-State-SATA-Drive-Engineering-Specification.pdf
ID Attribute 9 Power-On Hours 12 Power On Count 175 Program Fail Count (chip) 176 Erase Fail Count (chip) 177 Wear Leveling Count 178 Used Reserved Block Count (Chip) 179 Used Reserved Block Count (Total) 180 Unused Reserved Block Count (Total) 181 Program Fail Count (Total) 182 Erase Fail Count (Total) 183 Runtime Bad Block (Total) 187 Uncorrectable Error Count 195 ECC rate 198 Off-line Uncorrectable Error Count 199 CRC Error Count
http://www.arrowne.com/solid-state/pdf/STEC%20MACH4%20Datasheet.pdf
SLCFxGM4U(I)(-M) CompactFlash Card
ID Name Description Type 1 Raw Read Error Count of raw data errors while data from media, including retry errors or data error (uncorrectable) Warranty 2 Throughput Performance Internally measured average and worst data transfer rate Warranty 5 Reallocated Sector Count Count of reallocated blocks. In the case of the CF card, this will be count of reallocated or remapped blocks during normal operation from the grown defect table Warranty 9 Power On Hours Number of hours elapsed in the Power-On state Advisory 12 Power Cycle Number of power-on events Advisory 13 Soft Read Error Rate Number of corrected read errors reported to the operating system (SLC = 3 or more bits; MLC =5 or more bits) Advisory 100 Erase/program cycles Count of erase program cycles for entire card Advisory 103 Translation Table Rebuild Power backup fault or internal error resulting in loss of system unit tables Advisory 170 Reserved Block Count Number of reserved spares for bad block handling Warranty 171 Program Fail Count Count of flash program failures Advisory 172 Erase Fail Count Count of flash erase command failures Advisory 173 Wear Leveling Count Worst case erase count Advisory 174 Unexpected Power Loss Attribute counts number of unexpected power loss events Advisory 184 End-to-end error detection Tracks the number of end to end internal card data path errors that were detected Warranty 187 Reported Uncorrectable Errors Number of uncorrectable errors reported at the interface. Advisory 188 Command Timeout Tracks the number of command time outs as defined by an active command being interrupted Advisory 194 Temperature Temperature of the base casting. Advisory 196 Reallocation Event Total number of remapping events during normal operation and offline surface scanning. Advisory 198 Offline Surface Scan # of uncorrected errors that occurred during offline scan. Advisory 199 UDMA CRC Error Number of CRC errors during UDMA mode Advisory
http://www.stec-inc.com/downloads/flash_datasheets/SLMPCIxGM4U_M_61000_05494.pdf
SLMPCIxGM4U-M mPCI-Express IDE Card
ID Name Description Type 1 Raw Read Error Count of raw data errors while data from media, including retry errors or data error (uncorrectable) Warranty 2 Throughput Performance Internally measured average and worst data transfer rate Warranty 5 Reallocated Sector Count Count of reallocated blocks. In the case of the mPCI-Express IDE Card, this will be count of reallocated or remapped blocks during normal operation from the grown defect table Warranty 9 Power On Hours Number of hours elapsed in the Power-On state Advisory 12 Power Cycle Number of power-on events Advisory 13 Soft Read Error Rate Number of corrected read errors reported to the operating system (SLC = 3 or more bits; MLC =5 or more bits) Advisory 100 Erase/program cycles Count of erase program cycles for entire card Advisory 103 Translation Table Rebuild Power backup fault or internal error resulting in loss of system unit tables Advisory 170 Reserved Block Count Number of reserved spares for bad block handling Warranty 171 Program Fail Count Count of flash program failures Advisory 172 Erase Fail Count Count of flash erase command failures Advisory 173 Wear Leveling Count Worst case erase count Advisory 174 Unexpected Power Loss Attribute counts number of unexpected power loss events Advisory 184 End-to-end error detection Tracks the number of end to end internal card data path errors that were detected Warranty 187 Reported Uncorrectable Errors Number of uncorrectable errors reported at the interface. Advisory 188 Command Timeout Tracks the number of command time outs as defined by an active command being interrupted Advisory 194 Temperature Temperature of the base casting. Advisory 196 Reallocation Event Total number of remapping events during normal operation and offline surface scanning. Advisory 198 Offline Surface Scan # of uncorrected errors that occurred during offline scan. Advisory 199 UDMA CRC Error Number of CRC errors during UDMA mode Advisory
http://www.satron.at/pdf/NSSD_25_SATA.pdf
Serial ATA NSSD (NAND based Solid State Drive)
Attribute ID Numbers: Any nonzero value in the Attribute ID Number indicates an active attribute. The device supports following Attribute ID Numbers. The names marked with (*) indicate that the corresponding Attribute Values is fixed value for compatibility.
ID Attribute Name 0 Indicates that this entry in the data structure is not used * 1 Raw Read Error Rate * 3 Spin Up Time * 4 Spin Up Count * 5 Reallocated Sector Count * 7 Seek Error Rate * 8 Seek Time Performance * 9 Power-On Hours 10 Spin Retry Count * 11 LUL Retry Count * 12 Power On Count 184 Buffer CRC Count * 187 Uncorrectable Error Count 188 Command Time-out Error Count * 190 Air Flow Temperature * 191 Shock Count * 192 Emergency Retract * 193 LUL Count * 194 User Temperature * 195 ECC rate 197 Pending Sector Count * 198 Off-line Uncorrectable Error Count 199 CRC Error Count 200 Used Reserved Block Count 201 Program Fail Count 202 Erase Fail Count 203 Wear Leveling Count
121.44.19.141 (talk) 22:28, 25 October 2009 (UTC)
[edit] Attributes reseting
Some attribute may be useful to be reset. The "UDMA CRC error" is due to cable issue (damaged, bad shield, bad PSU voltage ...). When replaced cable or moved HDD into another computer, we should reset that attribute. "Smart" idea, isn't it?
But, seems there is no way to reset any SMART attributes as easy we would.
1) Please add here other obvious examples on SMART attributes they are useful to be reset when working environnment change. 2) Please tell about utilities they can reset them or use other hint (updating firmware, fulling platter by zeros, ...)
I insist there are bad/malhonest reasons to reset some attributes: ie the "total power on counter" and so on... I do not ask how to reset them ones.
Kind regards, LaPeche35, France —Preceding unsigned comment added by 213.56.248.115 (talk) 10:04, 18 November 2009 (UTC)
[edit] Attribute descriptions copied directly from other sources.
This article may contain text that is copied verbatim from other sources. For example, the description of the "Reallocated Sectors" attribute is identical word-for-word with the description at [3] (archived at [4]). I do not know if that page copied from the wikipedia article, or the other way around, or if both copied from the same other source. --24.190.224.244 (talk) 17:23, 18 February 2010 (UTC)
[edit] Deleted wrong citation
I've deleted the below sentence from the article. Though it's informative if true, but the cited page doesn't corroborate it:
Approximately 64% of failures can be predicted by S.M.A.R.T.<ref>[http://smartlinux.sourceforge.net/smart/faq.php?#2 How does S.M.A.R.T. work?]</ref> —Preceding unsigned comment added by 123.222.33.67 (talk) 06:46, 26 March 2010 (UTC)
[edit] Windows Software
Is there any good Windows software to access S.M.A.R.T.? I've found the linux smartd/smartctl tools to be very useful, but such things aren't for everyone. —Preceding unsigned comment added by 142.179.217.154 (talk) 01:21, 28 July 2010 (UTC)
- Smartmontools are available under Windows as well (comfortably with Gsmartcontrol). You need to install some GTK package, I think. Speedfan is another tool that reads Smart Data. Lavalys' Everest does too. And here is a whole list of them: Comparison_of_S.M.A.R.T._tools --Echosmoke (talk) 02:34, 19 January 2011 (UTC)
[edit] Accessibility and readility improvements
I've just made a few changes according to this discussion about readability / visibility concerns for some graphics. I've also made a few accessibility improvements and simplified the syntax at the same time. The down icon doesn't show up just yet because of a large-scale software bug. Hopefully it will be fixed soon; we shouldn't remove it for that motive in the meantime. Yours, Dodoïste (talk) 21:38, 22 August 2010 (UTC)
[edit] Cheking the SMART
Nowhere in the article is said how to check/read the information stored in the SMART ;). Please somebody expand the article. --Leonardo Da Vinci (talk) 10:41, 7 December 2010 (UTC)
[edit] Attribute values, raw values, and thresholds
I'm looking at both a Western Digital and a Hitachi hard drive right now, and it seems clear that manufacturers have broad leeway with the SMART data displayed.
Some "real" values appear in the "raw value" field, such as Reallocated_Sectors_Ct, and the "value" field is false (displaying "100"). Some "real" values appear in the "value" field, such as Temperature, and the "raw value" is false or at least meaningless (displaying 159 billion +/- a few).
I suspect that the "Threshold" value for some attributes is a number that should not be exceeded (e.g. Reallocated_Sector_Ct), and for other attributes is a number that should always be exceeded (e.g. some kind of percentage-of-original-performance number).
So, to get useful information, I suggest looking at the "value" field to see if it's likely invalid (e.g. a round number like 100) then considering the "raw value" field, then deciding if for the attribute you are looking at if a lower-than or greater-than threshold makes the most sense.
And to get the very best information, the data probably should be gathered continuously over time so that sudden changes can be noted. --Scalveg (talk) 23:19, 17 May 2011 (UTC)
[edit] SSD SMART attributes
Seems like the table should reflect some attributes used by SSDs. Some key ones on my Intel SSD seem to be:
- media wearout indicator -- % of drive lifetime left based on number of writes/number of rated writes
- host writes count -- number of sectors written from system's perspective, divided by 65536 (it seems)
- available reserved space -- % of reserved space (probably drops rapidly near end of drive life)
All that and more is in Intel's manual. And other controller makers must have their own.
There are also places that the article's wording, and SMART's, are wrong for SSDs -- e.g., smartctl is telling me about "spin-up time" and "number of attempts to compensate for platter speed variations" which clearly don't literally apply to the drive. Presumably we just don't care. Maybe it's worth a sentence or two on SSDs' failure mode (flash wearout, flaking controller logic) anyway. 173.164.250.233 (talk) 18:02, 1 June 2011 (UTC)
[edit] Footnote #2 URL is invalid
Footnote #2 URL on Google.com is returning 404-not found. It is the article about the disk wear analysis. — Preceding unsigned comment added by Bmomjian (talk • contribs) 00:47, 29 December 2011 (UTC)
[edit] Different sections for SMART attributes
I think is better to differentiate SMART attributes in more than a single table, because each vendor have their attribute name (some times at the same ID of another vendor) and a different meaning. This situation is more marked now for the introduction of SSDs. I propose a table for the classical HDDs and a table for each SSD manufacturer. A sample for these pages can be get here for classical HDDs, and then for SSD manufacturers Indilinx, Intel, JMicron, Micron, Samsung, SandForce, Satron, SMART and STEC.95.232.243.104 (talk) 18:20, 17 January 2012 (UTC)