Full table scan

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order and the columns encountered are checked for the validity of a condition.[1] Full table scans [2] are usually the slowest method of scanning a table due to the heavy amount of I/O reads required from the disk which consists of multiple seeks as well as costly disk to memory transfers.

Overview[edit]

In a database, a query that is not indexed results in a full table scan, where the database processes each record of the table to find all records meeting the given requirements. Even if the query selects just a few rows from the table, all rows in the entire table will be examined. This usually results in suboptimal performance but may be acceptable with very small tables or when the overhead of keeping indexes up to date is high.

When the Optimizer Considers a Full Table Scan[3][edit]

The most important factor in choosing depends on speed. This means that a full table scan should be used when it is the fastest and cannot use a different access path. Several full table scan examples are as followed.

  • No index

Obviously, the optimizer must use full table scan without index.

  • The number of rows of table is low

The cost of full table scan is less than index range scan due to small table.

  • When query processed SELECT COUNT(*), nulls existed in the column

Query want to count the number of null column in typical index. However, SELECT COUNT(*) cannot count the number of null column.

  • The query is unselective

The number of return rows is too large and takes nearly 100% in the whole table. These rows are unselective.

  • The table statistics does not update

The number of rows in table was low before, but now the number grows higher. The table statistics does not update and regard it as small table. The optimizer does not know that the index is faster.

  • The table has a high degree of parallelism

The high degree of parallelism table distorts the optimizer from a true way, because optimizer would use full table scan.

  • A full table scan hint

The hint lets optimizer to use full table scan.


How to Avoid Full table scan[edit]

If a table is large, full table scan should be avoided due to high cost.

  • Update key distribution for the scanned table by ANALYSE TABLE tb1[4]
  • Use FORCE INDEX to prevent mysql database optimizer from full table scan
  • Use SET max_seeks_for_key to tell optimizer that no key scans could do over a certain number

Example[edit]

A full table scan example: The example shows the SQL statement of searching items with id is bigger than 10 from table1

   SELECT category_id1
   FROM table1
   WHERE category_id2 > 10;

In this situation, the database system needs to scan full table to find the content which fits the requirement.

The other example shows the SQL statement of searching employee information by their first name order

   SELECT first_name 
   FROM employees 
   ORDER BY first_name;

In this situation, the database system also needs to scan full table to compare the first name.

Pros and Cons[edit]

Pros:

  • The cost is constant, as every time database system needs to scan full table row by row.
  • When table is less than 2 percent of database block buffer, the full scan table is quicker.

Cons:

  • Full table scan occurs when there is no index or index is not being used by SQL. And the result of full scan table is usually slower that index table scan. The situation is that: the larger the table, the slower of the data returns.
  • Unnecessary full-table scan will lead to a huge amount of unnecessary I/O with a process burden on the entire database.


See also[edit]

References[edit]

  1. ^ "Avoiding Table Scans". Oracle. 2011. 
  2. ^ "Which is Faster: Index Access or Table Scan?". Microsoft TechNet. 2002. 
  3. ^ "Optimizer Access Paths". Oracle. 2013. 
  4. ^ "How to Avoid Full Table Scans". MySQL. 2016.