Jump to content

Deadline Scheduler

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 152.8.99.118 (talk) at 20:16, 31 May 2013 (→‎References: Correcting Heading Type). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Deadline scheduler is an I/O scheduler for the Linux kernel which was written in 2002 by Jens Axboe.

Overview

The main goal of the Deadline scheduler is to guarantee a start service time for a request.[1] It does that by imposing a deadline on all I/O operations to prevent starvation of requests. It also maintains two deadline queues, in addition to the sorted queues (both read and write). Deadline queues are basically sorted by their deadline (the expiration time), while the sorted queues are sorted by the sector number.

Before serving the next request, the Deadline scheduler decides which queue to use. Read queues are given a higher priority, because processes usually block on read operations. Next, the Deadline scheduler checks if the first request in the deadline queue has expired. Otherwise, the scheduler serves a batch of requests from the sorted queue. In both cases, the scheduler also serves a batch of requests following the chosen request in the sorted queue.

By default, read requests have an expiration time of 500 ms, write requests expire in 5 seconds.

An early version of the scheduler was published by Axboe in September 2002.[citation needed]

The kernel documentation suggest this is the preferred scheduler for database systems, especially with Tagged Command Queuing (TCQ) aware disks, or any system with large numbers of disks.[2] Since preference is given to reads over writes, coupled with its latency guarantees, this scheduler can also be ideal for most webserver workloads.

sysfs tunables

  • fifo_batch (integer)

Deadline executes I/O Operations (IOPs) through the concept of "batches" which are sets of operations ordered in terms of increasing sector number. This tunable determines how big a batch will have to be before the requests are queued to the disk (barring expiration of a currently-being-built batch). Smaller batches can reduce latency by ensuring the requests are executed sooner (rather than possibly waiting for more requests to come in), but may degrade overall throughput by increasing the overall movement of drive heads (since sequencing happens within a batch and not between them). Additionally, if the number of IOPs is high enough the batches will be executed in a timely fashion anyways.

  • read_expire (integer)

The goal of the deadline io scheduler is to attempt to guarantee a start service time for a request (focusing mainly on read latencies). When a read request first enters the io scheduler, it is assigned a deadline that is the current time + the read_expire value in units of milliseconds. In other words, this is the tunable which controls the scheduler's notion of an appropriate deadline/expiration for a batch.

  • write_expire (integer)

Identical to read_expire but for read operations (grouped into separate batches from reads).

  • writes_starved (integer)

As stated previously, deadline prefers reads to writes. As a consequence, this can lead to situations where the operations are executed are almost entirely reads. This becomes more of an important tunable as write_expire is elongated or overall bandwidth approaches saturation. Decreasing this gives more bandwidth to writes (relatively speaking) at the expense of read operations. If application workload, however, is read-heavy (for example most HTTP or directory servers) with only an occasional read, decreased latency of average IOPs may be achieved by increasing this (so that more reads must be performed before a write batch is queued to disk).

  • front_merges (bool integer)

A "front merge" is an operation where the I/O Scheduler, seeking to condense (or "merge") smaller requests into fewer (larger) operations, will take a new operation then examine the active batch and attempt to locate operations where the beginning sector is the same or immediately after another operation's beginning sector. A "back merge" is the opposite, where ending sectors in the active batch are searched for sectors that are either the same or immediately after the current operation's beginning sectors. Merging diverts operations from the current batch to the active one, decreasing "fairness" in order to increase throughput.

Due to the way files are typically laid out, back merges are much more common than front merges. For some work loads, you may even know that it is a waste of time to spend any time attempting to front merge requests. Setting front_merges to 0 disables this functionality. Front merges may still occur due to the cached last_merge hint, but since that comes at basically zero cost, it is still performed. This boolean simply disables front sector lookup when the I/O scheduler merging function is called. Disk merge totals are recorded per-block device in /proc/diskstats. [3]

Other I/O schedulers

References

  1. ^ Jens Axboe (11 November 2002). "Deadline IO scheduler tunables". Linux kernel documentation. Retrieved 20 November 2011.
  2. ^ "Linux IO Scheduler". Wiki. Waikato Linux Users Group. 16 December 2010. Retrieved 20 November 2011.
  3. ^ "Deadline IO Scheduler Tunables".