Self-Monitoring, Analysis and Reporting Technology: Difference between revisions

Content deleted Content added

Inline

Revision as of 14:36, 6 July 2014

S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system for computer hard disk drives (HDDs) and solid-state drives (SSDs)^[1] to detect and report on various indicators of reliability, in the hope of anticipating failures.

When a failure is anticipated by S.M.A.R.T., the user may choose to replace the drive to avoid unexpected outage and data loss. The manufacturer may be able to use the S.M.A.R.T. data to discover where faults lie and prevent them from recurring in future drive designs.

Background

Hard disk failures fall into one of two basic classes:

Predictable failures result from slow processes such as mechanical wear and gradual degradation of storage surfaces. Monitoring can determine when such failures are becoming more likely.
Unpredictable failures happen suddenly and without warning. They range from electronic components becoming defective to a sudden mechanical failure (perhaps due to improper handling).

Mechanical failures account for about 60% of all drive failures.^[2] While the eventual failure may be catastrophic, most mechanical failures result from gradual wear and there are usually certain indications that failure is imminent. These may include increased heat output, increased noise level, problems with reading and writing of data, or an increase in the number of damaged disk sectors.

Work at Google on over 100,000 drives over a 9-month period found correlations between certain SMART information and actual failure rates. In the 60 days following the first off-line scan uncorrectable error on a drive (SMART attribute 0xC6 or 198), the drive was, on average, 39 times more likely to fail than it would have been if no such error occurred. First errors in reallocations, offline reallocations (SMART attributes 0xC4 and 0x05 or 196 and 5) and probational counts (SMART attribute 0xC5 or 197) were also strongly correlated to higher probabilities of failure. Conversely, little correlation was found for increased temperature and no correlation for usage level. However, the research showed that a large proportion (56%) of the failed drives failed without recording any count in the 'four strong S.M.A.R.T. warnings' identified as scan errors, reallocation count, offline reallocation and probational count. Further, 36% of drives failed without recording any S.M.A.R.T. error at all (except temperature), meaning that S.M.A.R.T. data alone was of limited usefulness in anticipating failures.^[3]

PCTechGuide's page on SMART (2003)^[4] comments that the technology has gone through three phases:

In its original incarnation SMART provided failure prediction by monitoring certain online hard drive activities. A subsequent version improved failure prediction by adding an automatic off-line read scan to monitor additional operations. The latest "SMART" technology not only monitors hard drive activities but adds failure prevention by attempting to detect and repair sector errors. Also, while earlier versions of the technology only monitored hard drive activity for data that was retrieved by the operating system, this latest SMART tests all data and all sectors of a drive by using "off-line data collection" to confirm the drive's health during periods of inactivity.

History and predecessors

An early hard disk monitoring technology was introduced by IBM in 1992 in its IBM 9337 Disk Arrays for AS/400 servers using IBM 0662 SCSI-2 disk drives.^[5] Later it was named Predictive Failure Analysis (PFA) technology. It was measuring several key device health parameters and evaluating them within the drive firmware. Communications between the physical unit and the monitoring software were limited to a binary result: namely, either "device is OK" or "drive is likely to fail soon".

Later, another variant, which was named IntelliSafe, was created by computer manufacturer Compaq and disk drive manufacturers Seagate, Quantum, and Conner.^[6] The disk drives would measure the disk’s "health parameters", and the values would be transferred to the operating system and user-space monitoring software. Each disk drive vendor was free to decide which parameters were to be included for monitoring, and what their thresholds should be. The unification was at the protocol level with the host.

Compaq submitted its implementation to the Small Form Factor (SFF) committee for standardization in early 1995.^[7] It was supported by IBM, by Compaq's development partners Seagate, Quantum, and Conner, and by Western Digital, which did not have a failure prediction system at the time. The Committee chose IntelliSafe's approach, as it provided more flexibility. The resulting jointly developed standard was named SMART.

That SFF standard described a communication protocol for an ATA host to use and control monitoring and analysis in a hard disk drive, but did not specify any particular metrics or analysis methods. Later, "SMART" came to be understood (though without any formal specification) to refer to a variety of specific metrics and methods and to apply to protocols unrelated to ATA for communicating the same kinds of things.

Information provided

The technical documentation for SMART is in the AT Attachment (ATA) standard. First introduced in 2004,^[8] it has undergone regular revisions,^[9] the latest being in 2008.^[10]

The most basic information that SMART provides is the SMART status. It provides only two values: "threshold not exceeded" and "threshold exceeded". Often these are represented as "drive OK" or "drive fail" respectively. A "threshold exceeded" value is intended to indicate that there is a relatively high probability that the drive will not be able to honor its specification in the future: that is, the drive is "about to fail". The predicted failure may be catastrophic or may be something as subtle as the inability to write to certain sectors, or perhaps slower performance than the manufacturer's declared minimum.

The SMART status does not necessarily indicate the drive's past or present reliability. If a drive has already failed catastrophically, the SMART status may be inaccessible. Alternatively, if a drive has experienced problems in the past, but the sensors no longer detect such problems, the SMART status may, depending on the manufacturer's programming, suggest that the drive is now sound.

The inability to read some sectors is not always an indication that a drive is about to fail. One way that unreadable sectors may be created, even when the drive is functioning within specification, is through a sudden power failure while the drive is writing. Also, even if the physical disk is damaged at one location, such that a certain sector is unreadable, the disk may be able to use spare space to replace the bad area, so that the sector can be overwritten.^[11]

More detail on the health of the drive may be obtained by examining the SMART Attributes. SMART Attributes were included in some drafts of the ATA standard, but were removed before the standard became final. The meaning and interpretation of the attributes varies between manufacturers, and are sometimes considered a trade secret for one manufacturer or another. Attributes are further discussed below.^[12]

Drives with SMART may optionally maintain a number of 'logs'. The error log records information about the most recent errors that the drive has reported back to the host computer. Examining this log may help one to determine whether computer problems are disk-related or caused by something else (error log timestamps may "wrap" after 2³² ms = 49.71 days^[13])

A drive that implements SMART may optionally implement a number of self-test or maintenance routines, and the results of the tests are kept in the self-test log. The self-test routines may be used to detect any unreadable sectors on the disk, so that they may be restored from back-up sources (for example, from other disks in a RAID). This helps to reduce the risk of incurring permanent loss of data.

Standards and implementation

Lack of common interpretation

Many motherboards display a warning message when a disk drive is approaching failure. Although an industry standard exists among most major hard drive manufacturers,^[4] there are some remaining issues and much proprietary "secret knowledge" held by individual manufacturers as to their specific approach. As a result, S.M.A.R.T. is not always implemented correctly on many computer platforms, due to the absence of industry-wide software and hardware standards for S.M.A.R.T. data interchange.^{[citation needed]}

From a legal perspective, the term "S.M.A.R.T." refers only to a signaling method between internal disk drive electromechanical sensors and the host computer. Hence, a drive may be claimed by its manufacturers to implement S.M.A.R.T. even if it does not include, say, a temperature sensor, which the customer might reasonably expect to be present. Moreover, in the most extreme case, a disk manufacturer could, in theory, produce a drive which includes a sensor for just one physical attribute, and then legally advertise the product as "S.M.A.R.T. compatible".^{[citation needed]}

Visibility to host systems

Depending on the type of interface being used, some S.M.A.R.T.-enabled motherboards and related software may not communicate with certain S.M.A.R.T.-capable drives. For example, few external drives connected via USB and Firewire correctly send S.M.A.R.T. data over those interfaces. With so many ways to connect a hard drive (SCSI, Fibre Channel, ATA, SATA, SAS, SSA, and so on), it is difficult to predict whether S.M.A.R.T. reports will function correctly in a given system.

Even with a hard drive and interface that implements the specification, the computer's operating system may not see the S.M.A.R.T. information because the drive and interface are encapsulated in a lower layer. For example, they may be part of a RAID subsystem in which the RAID controller sees the S.M.A.R.T.-capable drive, but the main computer sees only a logical volume generated by the RAID controller.

On the Windows platform, many programs designed to monitor and report S.M.A.R.T. information will function only under an administrator account. At present, S.M.A.R.T. is implemented individually by manufacturers, and while some aspects are standardized for compatibility, others are not.

Access

For a list of various programs that allow reading of Smart Data, see Comparison of S.M.A.R.T. tools.

ATA S.M.A.R.T. attributes

Each drive manufacturer defines a set of attributes,^[14]^[6] and sets threshold values beyond which attributes should not pass under normal operation. Each attribute has a raw value, whose meaning is entirely up to the drive manufacturer (but often corresponds to counts or a physical unit, such as degrees Celsius or seconds), a normalized value, which ranges from 1 to 253 (with 1 representing the worst case and 253 representing the best) and a worst value, which represents the lowest recorded normalized value. Depending on the manufacturer, a value of 100 or 200 will often be chosen as the initial normalized value.^{[citation needed]}

Manufacturers that have implemented at least one SMART attribute in various products include Samsung, Seagate, IBM (Hitachi), Fujitsu, Maxtor, Toshiba, Intel, STEC Inc, Western Digital and ExcelStor Technology.

Known ATA S.M.A.R.T. attributes

The following chart lists some S.M.A.R.T. attributes and the typical meaning of their raw values. Normalized values are always mapped so that higher values are better (with only very rare exceptions such as the "Temperature" attribute on certain Seagate drives^[15]), but higher raw attribute values may be better or worse depending on the attribute and manufacturer. For example, the "Reallocated Sectors Count" attribute's normalized value decreases as the count of reallocated sectors increases. In this case, the attribute's raw value will often indicate the actual count of sectors that were reallocated, although vendors are in no way required to adhere to this convention.

As manufacturers do not necessarily agree on precise attribute definitions and measurement units, the following list of attributes should be regarded as a general guide only.

Legend
	Higher raw value is better
	Lower raw value is better
Critical: pink colored row	Potential indicators of imminent electromechanical failure

ID	Hex	Attribute name	Better	Description
01	0x01	Read Error Rate		(Vendor specific raw value.) Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number.
02	0x02	Throughput Performance		Overall (general) throughput performance of a hard disk drive. If the value of this attribute is decreasing there is a high probability that there is a problem with the disk.
03	0x03	Spin-Up Time		Average time of spindle spin up (from zero RPM to fully operational [milliseconds]).
04	0x04	Start/Stop Count		A tally of spindle start/stop cycles. The spindle turns on, and hence the count is increased, both when the hard disk is turned on after having before been turned entirely off (disconnected from power source) and when the hard disk returns from having previously been put to sleep mode.^[16]
05	0x05	Reallocated Sectors Count		Count of reallocated sectors. When the hard drive finds a read/write/verification error, it marks that sector as "reallocated" and transfers data to a special reserved area (spare area). This process is also known as remapping, and reallocated sectors are called "remaps". The raw value normally represents a count of the bad sectors that have been found and remapped. Thus, the higher the attribute value, the more sectors the drive has had to reallocate. This allows a drive with bad sectors to continue operation; however, a drive which has had any reallocations at all is significantly more likely to fail in the near future.^[3] While primarily used as a metric of the life expectancy of the drive, this number also affects performance. As the count of reallocated sectors increases, the read/write speed tends to become worse because the drive head is forced to seek to the reserved area whenever a remap is accessed. If sequential access speed is critical, the remapped sectors can be manually marked as bad blocks in the file system in order to prevent their use.
06	0x06	Read Channel Margin		Margin of a channel while reading data. The function of this attribute is not specified.
07	0x07	Seek Error Rate	—	(Vendor specific raw value.) Rate of seek errors of the magnetic heads. If there is a partial failure in the mechanical positioning system, then seek errors will arise. Such a failure may be due to numerous factors, such as damage to a servo, or thermal widening of the hard disk. The raw value has different structure for different vendors and is often not meaningful as a decimal number.
08	0x08	Seek Time Performance		Average performance of seek operations of the magnetic heads. If this attribute is decreasing, it is a sign of problems in the mechanical subsystem.
09	0x09	Power-On Hours (POH)		Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state.^[17] On some pre-2005 drives, this raw value may advance erratically and/or "wrap around" (reset to zero periodically).^[18]
10	0x0A	Spin Retry Count	alt =Lower	Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.
11	0x0B	Recalibration Retries or Calibration Retry Count		This attribute indicates the count that recalibration was requested (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.
12	0x0C	Power Cycle Count		This attribute indicates the count of full hard disk power on/off cycles.
13	0x0D	Soft Read Error Rate		Uncorrected read errors reported to the operating system.
170	0xAA	Available Reserved Space		See attribute E8^[19]
171	0xAB	SSD Program Fail Count		(Kingston)Counts the number of flash program failures. This Attribute returns the total number of Flash program operation failures since the drive was deployed. This attribute is identical to attribute 181.
172	0xAC	SSD Erase Fail Count		(Kingston)Counts the number of flash erase failures. This Attribute returns the total number of Flash erase operation failures since the drive was deployed. This Attribute is identical to Attribute 182.
174	0xAE	Unexpected power loss count		Also known as "Power-off Retract Count" per conventional HDD terminology. Raw value reports the number of unclean shutdowns, cumulative over the life of an SSD, where an "unclean shutdown" is the removal of power without STANDBY IMMEDIATE as the last command (regardless of PLI activity using capacitor power). Normalized value is always 100.^[20]
175	0xAF	Power Loss Protection Failure		Last test result as microseconds to discharge cap, saturated at its maximum value. Also logs minutes since last test and lifetime number of tests. Raw value contains the following data: Bytes 0-1: Last test result as microseconds to discharge cap, saturates at max value. Test result expected in range 25 <= result <= 5000000, lower indicates specific error code. Bytes 2-3: Minutes since last test, saturates at max value. Bytes 4-5: Lifetime number of tests, not incremented on power cycle, saturates at max value. Normalized value is set to one on test failure or 11 if the capacitor has been tested in an excessive temperature condition, otherwise 100.^[20]
177	0xB1	Wear Range Delta		Delta between most-worn and least-worn Flash blocks. It describes how good/bad the wearleveling of the SSD works on a more technical way.
179	0xB3	Used Reserved Block Count Total		"Pre-Fail" Attribute used at least in Samsung devices.
180	0xB4	Unused Reserved Block Count Total		"Pre-Fail" Attribute used at least in HP devices.
181	0xB5	Program Fail Count Total or Non-4K Aligned Access Count		Total number of Flash program operation failures since the drive was deployed.^[21] Number of user data accesses (both reads and writes) where LBAs are not 4 KiB aligned (LBA % 8 != 0) or where size is not modulus 4 KiB (block count != 8), assuming logical block size (LBS) = 512 B^[22]
182	0xB6	Erase Fail Count		"Pre-Fail" Attribute used at least in Samsung devices.
183	0xB7	SATA Downshift Error Count or Runtime Bad Block		Western Digital and Samsung attribute. (or) Seagate.
184	0xB8	End-to-End error / IOEDC		This attribute is a part of Hewlett-Packard's SMART IV technology, as well as part of other vendors' IO Error Detection and Correction schemas, and it contains a count of parity errors which occur in the data path to the media via the drive's cache RAM.^[23]
185	0xB9	Head Stability		Western Digital attribute.
186	0xBA	Induced Op-Vibration Detection		Western Digital attribute.
187	0xBB	Reported Uncorrectable Errors	alt =Lower	The count of errors that could not be recovered using hardware ECC (see attribute 195).
188	0xBC	Command Timeout	alt =Lower	The count of aborted operations due to HDD timeout. Normally this attribute value should be equal to zero and if the value is far above zero, then most likely there will be some serious problems with power supply or an oxidized data cable.^[24]
189	0xBD	High Fly Writes		HDD producers implement a Fly Height Monitor that attempts to provide additional protections for write operations by detecting when a recording head is flying outside its normal operating range. If an unsafe fly height condition is encountered, the write process is stopped, and the information is rewritten or reallocated to a safe region of the hard drive. This attribute indicates the count of these errors detected over the lifetime of the drive. This feature is implemented in most modern Seagate drives^[2] and some of Western Digital’s drives, beginning with the WD Enterprise WDE18300 and WDE9180 Ultra2 SCSI hard drives, and will be included on all future WD Enterprise products.^[25]
190	0xBE	Airflow Temperature (WDC) resp. Airflow Temperature Celsius (HP)		Airflow temperature on Western Digital HDs (Same as temp. [C2], but current value is 50 less for some models. Marked as obsolete.)
190	0xBE	Temperature Difference from 100		Value is equal to (100−temp. °C), allowing manufacturer to set a minimum threshold which corresponds to a maximum temperature.
191	0xBF	G-sense Error Rate		The count of errors resulting from externally induced shock & vibration.
192	0xC0	Power-off Retract Count, Emergency Retract Cycle Count (Fujitsu),^[26] or Unsafe Shutdown Count		Count of times the heads are loaded off the media. Heads can be unloaded without actually powering off.^{[citation needed]}
193	0xC1	Load Cycle Count or Load/Unload Cycle Count (Fujitsu)		Count of load/unload cycles into head landing zone position.^[26] Western Digital rates their VelociRaptor drives for 600,000 load/unload cycles,^[27] and WD Green drives for 300,000 cycles;^[28] the latter ones are designed to unload heads often to conserve power. On the other hand, the WD3000GLFS (a desktop drive) is specified for only 50,000 load/unload cycles.^[29] Some laptop drives and "green power" desktop drives are programmed to unload the heads whenever there has not been any activity for a very short period of time, such as about five seconds.^[30]^[31] Many Linux installations write to the file system a few times a minute in the background.^[32] As a result, there may be 100 or more load cycles per hour, and the load cycle rating may be exceeded in less than a year.^[33]
194	0xC2	Temperature resp. Temperature Celsius		Current internal temperature.
195	0xC3	Hardware ECC Recovered	—	(Vendor-specific raw value.) The raw value has different structure for different vendors and is often not meaningful as a decimal number.
196	0xC4	Reallocation Event Count	alt =Lower	Count of remap operations. The raw value of this attribute shows the total count of attempts to transfer data from reallocated sectors to a spare area. Both successful & unsuccessful attempts are counted.^[34]
197	0xC5	Current Pending Sector Count	alt =Lower	Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it's written.^[35] However some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked good (in this case, the "Reallocation Event Count" (0xC4) will not be increased). This is a serious shortcoming, for if such a drive contains marginal sectors that consistently fail only after some time has passed following a successful write operation, then the drive will never remap these problem sectors.
198	0xC6	Uncorrectable Sector Count or Offline Uncorrectable or Off-Line Scan Uncorrectable Sector Count^[26]		The total count of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem.
199	0xC7	UltraDMA CRC Error Count		The count of errors in data transfer via the interface cable as determined by ICRC (Interface Cyclic Redundancy Check).
200	0xC8	Multi-Zone Error Rate ^[36]		The count of errors found when writing a sector. The higher the value, the worse the disk's mechanical condition is.
200	0xC8	Write Error Rate (Fujitsu)		The total count of errors when writing a sector.^[37]
201	0xC9	Soft Read Error Rate or TA Counter Detected		Count of off-track errors.
202	0xCA	Data Address Mark errors or TA Counter Increased		Count of Data Address Mark errors (or vendor-specific).^{[citation needed]}
203	0xCB	Run Out Cancel		The number of errors caused by incorrect checksum during the error correction.
204	0xCC	Soft ECC Correction	alt =Lower	Count of errors corrected by software ECC^{[citation needed]}
205	0xCD	Thermal Asperity Rate (TAR)	alt =Lower	Count of errors due to high temperature.^[24]
206	0xCE	Flying Height		Height of heads above the disk surface. A flying height that's too low increases the chances of a head crash while a flying height that's too high increases the chances of a read/write error.^{[citation needed]}
207	0xCF	Spin High Current	alt =Lower	Amount of surge current used to spin up the drive.^[24]
208	0xD0	Spin Buzz		Count of buzz routines needed to spin up the drive due to insufficient power.^[24]
209	0xD1	Offline Seek Performance		Drive’s seek performance during its internal tests.^[24]
210	0xD2	Vibration During Write		(found in a Maxtor 6B200M0 200GB and Maxtor 2R015H1 15GB disks)
211	0xD3	Vibration During Write		Vibration During Write^{[citation needed]}
212	0xD4	Shock During Write		Shock During Write^{[citation needed]}
220	0xDC	Disk Shift		Distance the disk has shifted relative to the spindle (usually due to shock or temperature). Unit of measure is unknown.
221	0xDD	G-Sense Error Rate		The count of errors resulting from externally induced shock & vibration.
222	0xDE	Loaded Hours		Time spent operating under data load (movement of magnetic head armature)^{[citation needed]}
223	0xDF	Load/Unload Retry Count		Count of times head changes position.^{[citation needed]}
224	0xE0	Load Friction	alt =Lower	Resistance caused by friction in mechanical parts while operating.^{[citation needed]}
225	0xE1	Load/Unload Cycle Count		Total count of load cycles^{[citation needed]}
226	0xE2	Load 'In'-time		Total time of loading on the magnetic heads actuator (time not spent in parking area).^{[citation needed]}
227	0xE3	Torque Amplification Count		Count of attempts to compensate for platter speed variations^{[citation needed]}
228	0xE4	Power-Off Retract Cycle		The count of times the magnetic armature was retracted automatically as a result of cutting power.^{[citation needed]}
230	0xE6	GMR Head Amplitude		Amplitude of "thrashing" (distance of repetitive forward/reverse head motion)^{[citation needed]}
230	0xE6	Drive Life Protection Status		Current state of drive operation based upon the Life Curve^[38]
231	0xE7	Temperature		Drive Temperature
231	0xE7	SSD Life Left		Indicates the approximate SSD life left, in terms of program/erase cycles or Flash blocks currently available for use.^[38]
232	0xE8	Endurance Remaining		Number of physical erase cycles completed on the drive as a percentage of the maximum physical erase cycles the drive is designed to endure
232	0xE8	Available Reserved Space		Intel SSD reports the number of available reserved space as a percentage of reserved space in a brand new SSD.
233	0xE9	Power-On Hours		Number of hours elapsed in the power-on state.
233	0xE9	Media Wearout Indicator		Intel SSD reports a normalized value of 100 (when the SSD is new) and declines to a minimum value of 1. It decreases while the NAND erase cycles increase from 0 to the maximum-rated cycles.
234	0xEA	Average erase count AND Maximum Erase Count		Decoded as: byte 0-1-2 = average erase count (big endian) and byte 3-4-5 = max erase count (big endian)^[39]
235	0xEB	Good Block Count AND System(Free) Block Count		decoded as: byte 0-1-2 = good block count (big endian) and byte 3-4 = system(free) block count.
240	0xF0	Head Flying Hours		Time spent during the positioning of the drive heads^{[citation needed]}
240	0xF0	Transfer Error Rate (Fujitsu)		Count of times the link is reset during a data transfer.^[40]
241	0xF1	Total LBAs Written		Total count of LBAs written
242	0xF2	Total LBAs Read		Total count of LBAs read. Some S.M.A.R.T. utilities will report a negative number for the raw value since in reality it has 48 bits rather than 32.
249	0xF9	NAND_Writes_1GiB		Total NAND Writes. Raw value reports the number of writes to NAND in 1 GB increments.^[41]
250	0xFA	Read Error Retry Rate		Count of errors while reading from a disk
254	0xFE	Free Fall Protection		Count of "Free Fall Events" detected ^[42]

Threshold Exceeds Condition

Threshold Exceeds Condition (TEC) is an estimated date when a critical drive statistic attribute will reach its threshold value. When Drive Health software reports a "Nearest T.E.C.", it should be regarded as a "Failure date". Sometimes, no date is given and the drive can be expected to work without errors.^[43]

To predict the date, the drive tracks the rate at which the attribute changes. Note that TEC dates are only estimates; hard drives can and do fail much sooner or much later than the TEC date.^[44]

Self-tests

SMART drives may offer a number of self-tests:^[45]^[46]^[47]

Short: Checks the electrical and mechanical performance as well as the read performance of the disk. Electrical tests might include a test of buffer RAM, a read/write circuitry test, or a test of the read/write head elements. Mechanical test includes seeking and servo on data tracks. Scans small parts of the drive's surface (area is vendor-specific and there is a time limit on the test). Checks the list of pending sectors that may have read errors, and it usually takes under two minutes.

Long/Extended: A longer and more thorough version of the short self-test, scans the entire disk surface, with no time limit. Usually takes hundreds of minutes, approximately one gigabyte per minute^{[citation needed]} for modern drives.

Conveyance: Intended as a quick test to identify damage incurred during transporting of the device from the drive manufacturer to the computer manufacturer.^[48] Only available on ATA drives, and it usually takes several minutes.

Selective: Some drives allow selective self-tests of just a part of the surface.^[49] The self-test logs for SCSI and ATA drives are slightly different. It is possible for the long test to pass even if the short test fails.^[50]

References

^ "Communicating With Your SSD: Understanding SMART Attributes | Samsung SSD". Samsung.com. Retrieved 2014-01-18.
^ ^a ^b Enhanced Smart attributes (PDF) (statement), Seagate
^ ^a ^b Pinheiro, Eduardo; Weber, Wolf-Dietrich; Barroso, Luís André, "Conclusion", Failure Trends in a Large Disk Drive Population (PDF), 1600 Amphitheatre Pkwy Mountain View, CA 94043: Google{{citation}}: CS1 maint: location (link)
^ ^a ^b SMART, PCTechGuide, 2003
^ No. ZG92-0289 (announcement letter), IBM, September 1, 1992
^ ^a ^b Ottem & Plummer 1995. Cite error: The named reference "FOOTNOTEOttemPlummer1995" was defined multiple times with different content (see the help page).
^ Compaq. IntelliSafe. Technical Report SSF-8035, Small Form Committee, January 1995
^ "ATA/ATAPI Command Set (ATA8-ACS)" (PDF), AT Attachment 8 (working draft) (0 ed.), ANSI INCITS, August 17, 2004 {{citation}}: |chapter= ignored (help)
^ Stephens 2006, pp. 44–126, 198–213, 327–44, Sections 4.19: "SMART (Self-monitoring, analysis, and reporting technology) feature set", 7.52: "SMART", Annex A: "Log Page Definitions"
^ "ATA/ATAPI Command Set (ATA8-ACS)" (PDF), AT Attachment 8 (working draft) (6a ed.), ANSI INCITS, September 6, 2008 {{citation}}: |chapter= ignored (help)
^ Hitachi Travelstar 80GN (PDF) (hard disk drive specification) (2.0 ed.), Hitachi Data Systems, 19 September 2003, Hitachi Document Part Number S13K-1055-20
^ Hatfield, Jim (September 30, 2005), SMART Attribute Annex (PDF), T13, e05148r0
^ "Maxtor", Smart mon tools (plain text) (example), Source forge
^ Stephens 2006, p. 207Of the 512 octets listed in table 42 on page 207: "Device SMART data structure" a total of 489 are marked as "Vendor specific".
^ "FAQ", Smartmontools, Source forge, Attribute 194 (Temperature Celsius) behaves strangely on my Seagate disk
^ "Self-Monitoring, Analysis and Reporting Technology (SMART)", Smart Linux (article), Source forge, 2009-03-10
^ "9109: S.M.A.R.T. Attribute: Power-On Hours (POH)", Knowledge Base, Acronis
^ "FAQ". Smartmontools. Sourceforge. Retrieved 2013-01-15.
^ Intel Solid-state Drive DC S3700 Series Product Specification (pdf) (product manual), Intel, March 2014
^ ^a ^b Intel Solid-state Drive DC S3700 Series Product Specification (pdf) (product manual), Intel, March 2014
^ "SMART Attribute Details" (PDF). Kingston Technology Corporation. 2013. p. 4. Archived from the original (PDF) on 2013-05-07. Retrieved 3 August 2013.
^ "The SMART Command Feature Set" (PDF). Micron Technology, Inc. August 2010. p. 11. Archived from the original (PDF) on 2013-02-01. Retrieved 3 August 2013.
^ "SMART IV Technology on HP Business Desktop Hard Drives" (PDF). Hewlett-Packard. Retrieved 8 September 2011.
^ ^a ^b ^c ^d ^e S.M.A.R.T. attribute list (ATA), HD sentinel
^ Fly Height Monitor Improves Hard Drive Reliability (PDF), Western Digital, April 1999, 79-850123-000
^ ^a ^b ^c MHT2080AT, MHT2060AT, MHT2040AT, MHT2030AT, MHG2020AT Disk Drives (PDF) (product manual), Fujitsu, 2003-07-04, C141-E192-02EN
^ WD VelociRaptor Spec Sheet (PDF), WD
^ WD Green Spec Sheet (PDF), WD
^ "WD VelociRaptor SATA Hard Drives" (PDF). wdc.com. 2008. Retrieved 2014-03-31.
^ "Problem with hard drive clicking", Think (wiki)
^ "hdparm(8) - Linux manual page". man7.org. November 2012. Retrieved 2014-03-31. Get/set the Western Digital (WD) Green Drive's "idle3" timeout value. This timeout controls how often the drive parks its heads and enters a low power consumption state. The factory default is eight (8) seconds, which is a very poor choice for use with Linux. Leaving it at the default will result in hundreds of thousands of head load/unload cycles in a very short period of time.
^ discussion list, Arch Linux, If linux tends to write to /var/log/* every 30s, then the heads can park/unpark every 30s.
^ "Hard drives", How to Reduce Power Consumption (wiki), Think, The files access time update, while mandated by POSIX, is causing lots of disks access; even accessing files on disk cache may wake the ATA or USB bus.
^ "S.M.A.R.T.-Attribut: Reallocation Event Count", Knowledge Base, Acronis
^ "S.M.A.R.T. Attribute: Power-On Hours (POH)", Knowledge Base, Acronis
^ Cabla, Lubomir (2009-08-06). "HDAT2 v4.6 User's Manual" (PDF) (1.1 ed.).
^ "Attributes". SMART Linux project. Source forge.
^ ^a ^b SMART Attribute Details (PDF) (PDF), Kingston
^ "Ticket 171". Smartmontools (log). Source forge.
^ "MHY2xxxBH Disk Drives, Product/Maintenance Manual" (PDF). Fujitsu Limited. C141-E192-02EN.
^ Intel Solid-state Drive 520 Series Product Specification (pdf) (product manual), Intel, February 2012
^ Momentus 7200.2 SATA (PDF) (product manual) (D ed.), Seagate, September 2007, Hitachi Document Part Number S13K-1055-20
^ "FAQ", Drive health, retrieved October 4, 2011
^ The interpretation of the TEC and the SMART, Altrix soft, retrieved October 4, 2011
^ "self-tests: "SMART RUN/ABORT OFFLINE TEST AND SELF-TEST OPTIONS: -t TEST, --test=TEST"", SMARTCTL
^ HDDScan – free HDD test utility with USB flash and RAID support.
^ Evans, Mark (26 April 1999), Hard Drive Self-tests (PDF), Milpitas, CA US: T10
^ Bulik, Darrin (Sep 24, 2001), Proposal for Extensions To Drive Self Test (PDF), Lake Forest, CA: T10
^ McLean, Pete (23 October 2001), Proposal for a Selective Self-test (PDF), Longmont, CO: T10
^ "HDD fails S.M.A.R.T. short test, but passes long test?". Hardware Canucks. Retrieved 2013-01-15.

External links

UC Santa Cruz and Quantum release S.M.A.R.T. software for Linux, Michael Cornwell.
UCSC SMART suite, SourceForge by: cornwell.
How does smartmontools differ from smartsuite?, SourceForge.
S.M.A.R.T. Monitoring Tools, SourceForge by: ballen4705.
smartmontools & smartsuite, smartmontools.org.
GSmartControl is a GUI for smartctl (part of smartmontools) by Alexander Shaduri
How S.M.A.R.T. is your hard drive?, UK: pc-king.co.uk.
How to predict hard disk failure (SMART Report), 2010-05-19 with Palimpsest (originally by David Zeuthen for Red Hat)
KB251: Understanding S.M.A.R.T. and S.M.A.R.T. failure and errors, Western Digital.
How does S.M.A.R.T. function of hard disks Work?.
HDDExpert - Clear view of your SMART data.

[1] "Communicating With Your SSD: Understanding SMART Attributes | Samsung SSD". Samsung.com. Retrieved 2014-01-18.

[seagate1-2] Enhanced Smart attributes (PDF) (statement), Seagate

[research.google.com-3] Pinheiro, Eduardo; Weber, Wolf-Dietrich; Barroso, Luís André, "Conclusion", Failure Trends in a Large Disk Drive Population (PDF), 1600 Amphitheatre Pkwy Mountain View, CA 94043: Google{{citation}}: CS1 maint: location (link)

[PCTechGuide-4] SMART, PCTechGuide, 2003

[5] No. ZG92-0289 (announcement letter), IBM, September 1, 1992

[FOOTNOTEOttemPlummer1995-6] Ottem & Plummer 1995. Cite error: The named reference "FOOTNOTEOttemPlummer1995" was defined multiple times with different content (see the help page).

[7] Compaq. IntelliSafe. Technical Report SSF-8035, Small Form Committee, January 1995

[D1699r0-8] "ATA/ATAPI Command Set (ATA8-ACS)" (PDF), AT Attachment 8 (working draft) (0 ed.), ANSI INCITS, August 17, 2004 {{citation}}: |chapter= ignored (help)

[D1699r3f-9] Stephens 2006, pp. 44–126, 198–213, 327–44, Sections 4.19: "SMART (Self-monitoring, analysis, and reporting technology) feature set", 7.52: "SMART", Annex A: "Log Page Definitions"

[D1699r6a-10] "ATA/ATAPI Command Set (ATA8-ACS)" (PDF), AT Attachment 8 (working draft) (6a ed.), ANSI INCITS, September 6, 2008 {{citation}}: |chapter= ignored (help)

[11] Hitachi Travelstar 80GN (PDF) (hard disk drive specification) (2.0 ed.), Hitachi Data Systems, 19 September 2003, Hitachi Document Part Number S13K-1055-20

[12] Hatfield, Jim (September 30, 2005), SMART Attribute Annex (PDF), T13, e05148r0

[13] "Maxtor", Smart mon tools (plain text) (example), Source forge

[FOOTNOTEStephens2006207-14] Stephens 2006, p. 207Of the 512 octets listed in table 42 on page 207: "Device SMART data structure" a total of 489 are marked as "Vendor specific".

[15] "FAQ", Smartmontools, Source forge, Attribute 194 (Temperature Celsius) behaves strangely on my Seagate disk

[16] "Self-Monitoring, Analysis and Reporting Technology (SMART)", Smart Linux (article), Source forge, 2009-03-10

[17] "9109: S.M.A.R.T. Attribute: Power-On Hours (POH)", Knowledge Base, Acronis

[18] "FAQ". Smartmontools. Sourceforge. Retrieved 2013-01-15.

[19] Intel Solid-state Drive DC S3700 Series Product Specification (pdf) (product manual), Intel, March 2014

[Intel-20] Intel Solid-state Drive DC S3700 Series Product Specification (pdf) (product manual), Intel, March 2014

[21] "SMART Attribute Details" (PDF). Kingston Technology Corporation. 2013. p. 4. Archived from the original (PDF) on 2013-05-07. Retrieved 3 August 2013.

[22] "The SMART Command Feature Set" (PDF). Micron Technology, Inc. August 2010. p. 11. Archived from the original (PDF) on 2013-02-01. Retrieved 3 August 2013.

[HPSMARTIV-23] "SMART IV Technology on HP Business Desktop Hard Drives" (PDF). Hewlett-Packard. Retrieved 8 September 2011.

[hdsentinel-24] S.M.A.R.T. attribute list (ATA), HD sentinel

[25] Fly Height Monitor Improves Hard Drive Reliability (PDF), Western Digital, April 1999, 79-850123-000

[Fujitsu_MHT20xxAT-26] MHT2080AT, MHT2060AT, MHT2040AT, MHT2030AT, MHG2020AT Disk Drives (PDF) (product manual), Fujitsu, 2003-07-04, C141-E192-02EN

[27] WD VelociRaptor Spec Sheet (PDF), WD

[28] WD Green Spec Sheet (PDF), WD

[29] "WD VelociRaptor SATA Hard Drives" (PDF). wdc.com. 2008. Retrieved 2014-03-31.

[30] "Problem with hard drive clicking", Think (wiki)

[31] "hdparm(8) - Linux manual page". man7.org. November 2012. Retrieved 2014-03-31. Get/set the Western Digital (WD) Green Drive's "idle3" timeout value. This timeout controls how often the drive parks its heads and enters a low power consumption state. The factory default is eight (8) seconds, which is a very poor choice for use with Linux. Leaving it at the default will result in hundreds of thousands of head load/unload cycles in a very short period of time.

[32] discussion list, Arch Linux, If linux tends to write to /var/log/* every 30s, then the heads can park/unpark every 30s.

[33] "Hard drives", How to Reduce Power Consumption (wiki), Think, The files access time update, while mandated by POSIX, is causing lots of disks access; even accessing files on disk cache may wake the ATA or USB bus.

[34] "S.M.A.R.T.-Attribut: Reallocation Event Count", Knowledge Base, Acronis

[35] "S.M.A.R.T. Attribute: Power-On Hours (POH)", Knowledge Base, Acronis

[hdat2-36] Cabla, Lubomir (2009-08-06). "HDAT2 v4.6 User's Manual" (PDF) (1.1 ed.).

[smartlinux-attrs-37] "Attributes". SMART Linux project. Source forge.

[kingston1-38] SMART Attribute Details (PDF) (PDF), Kingston

[39] "Ticket 171". Smartmontools (log). Source forge.

[40] "MHY2xxxBH Disk Drives, Product/Maintenance Manual" (PDF). Fujitsu Limited. C141-E192-02EN.

[41] Intel Solid-state Drive 520 Series Product Specification (pdf) (product manual), Intel, February 2012

[42] Momentus 7200.2 SATA (PDF) (product manual) (D ed.), Seagate, September 2007, Hitachi Document Part Number S13K-1055-20

[Drivehealth's_FAQ-43] "FAQ", Drive health, retrieved October 4, 2011

[Altrixsoft_FAQ-44] The interpretation of the TEC and the SMART, Altrix soft, retrieved October 4, 2011

[45] "self-tests: "SMART RUN/ABORT OFFLINE TEST AND SELF-TEST OPTIONS: -t TEST, --test=TEST"", SMARTCTL

[46] HDDScan – free HDD test utility with USB flash and RAID support.

[47] Evans, Mark (26 April 1999), Hard Drive Self-tests (PDF), Milpitas, CA US: T10

[48] Bulik, Darrin (Sep 24, 2001), Proposal for Extensions To Drive Self Test (PDF), Lake Forest, CA: T10

[49] McLean, Pete (23 October 2001), Proposal for a Selective Self-test (PDF), Longmont, CO: T10

[50] "HDD fails S.M.A.R.T. short test, but passes long test?". Hardware Canucks. Retrieved 2013-01-15.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

@@ Line 321: / Line 321: @@
 * [http://sourceforge.net/projects/gsmartcontrol/ GSmartControl] is a [[Graphical user interface|GUI]] for [http://sourceforge.net/projects/smartmontools/ smartctl (part of smartmontools)] by [http://sourceforge.net/users/alex-sh Alexander Shaduri]
 * {{Citation | url = http://www.pc-king.co.uk/tips3.htm | title = How S.M.A.R.T. is your hard drive? | publisher = pc-king.co.uk | place = [[United Kingdom|UK]]}}.
-* {{Citation | url = http://karuppuswamy.com/wordpress/2010/05/19/how-to-predict-hard-disk-failure-in-ubuntu-with-3-clicks/ | title = How to predict hard disk failure (SMART Report) | date = 2010-05-19}} with [[GNOME Disks|Palimpsest]] (originally by Red Hat)
+* {{Citation | url = http://karuppuswamy.com/wordpress/2010/05/19/how-to-predict-hard-disk-failure-in-ubuntu-with-3-clicks/ | title = How to predict hard disk failure (SMART Report) | date = 2010-05-19}} with [[GNOME Disks|Palimpsest]] (originally by [https://fedoraproject.org/wiki/Features/UdisksImprovements David Zeuthen] for  [[Red Hat]])
 * {{Citation | url = http://wdc.custhelp.com/app/answers/detail/a_id/251/ | publisher = Western Digital | title = KB251: Understanding S.M.A.R.T. and S.M.A.R.T. failure and errors}}.
 * {{Citation | url = http://www.hdsentinel.com/smart/index.php | title = How does S.M.A.R.T. function of hard disks Work?}}.