Native Command Queuing
In computing, Native Command Queuing (NCQ) is an extension of the Serial ATA protocol allowing hard disk drives to internally optimize the order in which received read and write commands are executed. This can reduce the amount of unnecessary drive head movement, resulting in increased performance (and slightly decreased wear of the drive) for workloads where multiple simultaneous read/write requests are outstanding, most often occurring in server-type applications.
Native Command Queuing was preceded by Parallel ATA's version of Tagged Command Queuing (TCQ). ATA's attempt at integrating TCQ was constrained by the requirement that ATA host bus adapters use ISA bus device protocols to interact with the operating system. The resulting high CPU overhead and negligible performance gain contributed to a lack of market acceptance for TCQ.
NCQ differs from TCQ in that, with NCQ, each command is of equal importance, but NCQ's host bus adapter also programs its own first party DMA engine with CPU-given DMA parameters during its command sequence whereas TCQ interrupts the CPU during command queries and requires it to modulate the ATA host bus adapter's third party DMA engine. NCQ's implementation is preferable because the drive has more accurate knowledge of its performance characteristics and is able to account for its rotational position. Both NCQ and TCQ have a maximum queue length of 32 outstanding commands (31 in practice).
For NCQ to be enabled, it must be supported and enabled in the SATA host bus adapter and in the hard drive itself. The appropriate driver must be loaded into the operating system to enable NCQ on the host bus adapter.
Many newer chipsets support the Advanced Host Controller Interface (AHCI), which allows operating systems to universally control them and enable NCQ. Newer mainstream Linux kernels support AHCI natively, and FreeBSD fully supports AHCI since version 8.0. Windows Vista and Windows 7 also natively support AHCI, but their AHCI support (via the msahci service) must be manually enabled via registry editing if controller support was not present during their initial install. Windows 7's AHCI enables not only NCQ but also TRIM support on SSD drives (with their supporting firmware). Older operating systems such as Windows XP require the installation of a vendor-specific driver (similar to installing a RAID or SCSI controller) even if AHCI is present on the host bus adapter, which makes initial setup more tedious and conversions of existing installations relatively difficult as most controllers cannot operate their ports in mixed AHCI–SATA/IDE/legacy mode.
Performance with magnetic hard drives
|This section requires expansion. (January 2014)|
A 2004 test with the first-generation NCQ drive (Seagate 7200.7 NCQ) found that while NCQ increased IOMeter performance, desktop application performance actually decreased. One review in 2010 found improvements on the order of 9% (on average) with NCQ enabled in a series of Windows multitasking tests.
NCQ can negatively interfere with the operating system's I/O scheduler, actually decreasing performance; this has been observed in practice on Linux with RAID-5. There is no mechanism in NCQ for the host to specify any sort of deadlines for an I/O, like how many times a request can be ignored in favor of others. In theory, an NCQ'd request can be delayed by the drive an arbitrary amount of time while it is serving other (possibly new) requests under I/O pressure. Since the algorithms used inside drives' firmware for NCQ dispatch ordering are generally not publicly known, this introduces another level of uncertainty for hardware/firmware performance. Tests at Google around 2008 have shown that NCQ can delay an I/O for up to 1-2 seconds. A proposed workaround is for the operating system to artificially starve the NCQ queue sooner in order to satisfy low-latency applications in a timely manner.
Safety with magnetic drives (FUA)
One little-known feature of NCQ is that (unlike its ATA TCQ predecessor) the host can specify whether it wants to be notified of completion when the data hits the disk's platters or when it hits the disk's buffer (on-board cache). Assuming a correct hardware implementation, this feature allows the disk's on-board cache to be used while guaranteeing correct semantics for system calls like fsync. This write flag (also borrowed from SCSI) is called Force Unit Access (FUA).
NCQ in solid-state drives
NCQ is also used in newer solid-state drives where the drive encounters latency on the host, rather than the other way around. For example, Intel's X25-E Extreme solid-state drive uses NCQ to ensure that the drive has commands to process while the host system is busy processing CPU tasks.
NCQ also enables the SSD controller to complete commands concurrently (or partly concurrently, for example using pipelines) where the internal organisation of the device enables such processing.
For example, the SandForce 1200 based OCZ Vertex II 50 GB drive running on a Dell Perc 5i (which doesn't support SATA NCQ) delivers about 7,000 4k IOPS (50% write) at a controller queue depth of 32 IO's. Moving the drive to the similar Dell Perc 6i (which does support SATA NCQ) increases this to over 14,000 IOPS on the same basis.[original research?]
The NVM Express standard also supports command queuing, in a form optimized for SSDs; NVMe allows multiple queues for a single controller/device and also allows much higher queue depth (64K vs 32) for each queue, which more closely matches how the underlying SSD hardware works.
- PDF white paper on NCQ from Intel and Seagate
- Volume 1 of the final draft of the ATA-7 standard
- "SATA II Native Command Queuing Overview", Intel Whitepaper, April 2003.
- "Seagate's Barracuda 7200.7 NCQ hard drive - The Tech Report - Page 13". The Tech Report. Retrieved 2014-01-11.
- "Multitasking with Native Command Queuing - The Tech Report - Page 5". The Tech Report. Retrieved 2014-01-11.
- Yu, Y. J.; Shin, D. I.; Eom, H.; Yeom, H. Y. (2010). "NCQ vs. I/O scheduler". ACM Transactions on Storage 6: 1. doi:10.1145/1714454.1714456. 
- "hard drive - Poor Linux software RAID 5 performance with NCQ". Server Fault. Retrieved 2014-01-11.
- Gwendal Grignou, NCQ Emulation, FLS'08 talk summary (p. 109) slides
- "Mark Lord: Re: Lower HD transfer rate with NCQ enabled?". LKML. 2007-04-03. Retrieved 2014-01-11.
- Marshall Kirk McKusick. "Disks from the Perspective of a File System - ACM Queue". Queue.acm.org. Retrieved 2014-01-11.
- Gregory Smith (2010). PostgreSQL 9.0: High Performance. Packt Publishing Ltd. p. 78. ISBN 978-1-84951-031-8.
- "Enabling FUA for SATA drives (was Re: [RFC][PATCH] libata: enable SATA disk fua detection on default) (Linux SCSI)". Spinics.net. 2012-08-17. Retrieved 2014-01-24.
- Robert Hancock <email@example.com>. "Linux-Kernel Archive: [PATCH RFC] libata: FUA updates". Lkml.indiana.edu. Retrieved 2014-01-24.
- "Forced Unit Access | Working Hard In IT". Workinghardinit.wordpress.com. 2012-10-12. Retrieved 2014-01-24.
- Gasior, Geoff (November 23, 2008). "Intel's X25-E Extreme solid-state drive - Now with single-level cell flash memory". Tech Report.
- Dave Landsman. "AHCI and NVMe as Interfaces for SATA Express™ Devices - Overview" (PDF). SanDisk. Retrieved 2013-10-02.
- "Overview". NVM Express. Retrieved 2014-01-24.