Jump to content

Search algorithm

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Cspooner (talk | contribs) at 22:23, 6 November 2009 (→‎List search). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, a search algorithm, broadly speaking, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. Most of the algorithms studied by computer scientists that solve problems are kinds of search algorithms.[citation needed] The set of all possible solutions to a problem is called the search space. Brute-force search, otherwise known as naïve or uninformed algorithms, use the simplest method of the searching through the search space, whereas informed search algorithms use heuristic functions to apply knowledge about the structure of the search space to try to reduce the amount of time spent searching.

An uninformed search algorithm is one that does not take into account the specific nature of the problem. As such, they can be implemented in general, and then the same implementation can be used in a wide range of problems due to abstraction. The drawback is that, in actual practice, many search spaces are extremely large, and an uninformed search (especially of tree or graph storage structures) will take an unreasonable amount of time for even relatively small examples. As such, practical needs usually demand something better; typically this means faster.

Lists, and sequences generally, are perhaps the most commonly encountered data structures; search algorithms adapted for them are common as well. The goal is to find one element of a set (ie, from the list) by some criterion (typically called a key and perhaps containing other information related to it). As this is a common problem in computer science, the computational complexity of these algorithms has been intensively studied.

The simplest such algorithm is linear search, which examines each element of the list as they are encountered. It is expensive in running time compared to many other algorithms: in O(n), where n is the number of items in the list. It can be used directly on any unprocessed list, regardless of history, which is sometimes useful. A more sophisticated list search algorithm is binary search; it runs in O(log n) time. This is significantly better than linear search for large lists, but is sometimes useful for surprisingly small ones given its increase in speed. But it requires the list be sorted before searching (see sorting algorithm) and generally, that the list be randomly accessible. This last may force lengthy (ie, time expensive) reads of mass storage before a binary search can be started. Interpolation search is better than binary search for large sorted lists with 'fairly even' key distributions (O(log (log n)) complexity), but has a worst-case running time of O(n).

Grover's algorithm is a quantum algorithm which offers quadratic speedup over classical linear search for unsorted lists. However, it runs only on currently non-existent quantum computers.

Hash tables can also be used for list searches:m they run in constant time in the average case, but have terrible worst-case time (O(n)) In addition, more space and time is required to set up such tables. Another search based on specialized data structures uses self-balancing binary search trees and needs only O(log n) time. Such tree searches can be seen as extending the ideas of binary search to allow fast insertion and removal in the data structures. See associative array for more discussion of list search data structures.

Most list search algorithms, such as linear search, binary search, and self-balancing binary search trees, can be extended with little additional cost to find all values less than or greater than a given key, an operation called range search. One such exception is hash tables, which cannot perform such searches efficiently.

Template:Graph search algorithm Tree search algorithms are likely the most used searching techniques for structured data. They examine trees of nodes, whether the tree is explicit or implicit (ie, generated by the search algorithm's execution). The basic principle is that a node is taken from a data structure, its successors examined and added to the data structure. By manipulating the data structure, the tree can be explored in different orders. For instance, level by level (ie, breadth-first search) or reaching a leaf node first and backtracking (ie, depth-first search), etc. Other examples of tree-searches include iterative-deepening search, depth-limited search, bidirectional search, and uniform-cost search.

The efficiency of a tree search is highly dependent upon the number and structure of nodes in relation to the number of items on that node. If there are a large number of items on one or more nodes, there may well be a requirement to use a specific different search technique for locating items within that particular set to attain adequate performance. In other words, a tree search is not mutually exclusive with any other search technique that may be used for specific sets. It is a method of reducing the number of relevant items to be searched to those within certain branches of the tree, thus reducing running time. For example, the Greater London telephone directory may contain entries for 20,000+ people whose surname is 'Smith' belonging on a tree branch on which are found 'surnames beginning S'. The list of names may, or may not be, further ordered (ie, structured) by subscribers first names or initials. A binary search may be appropriate to locate a particular person with given name 'Alice' and perhaps thereafter a linear search to locate a particular address for a specific Alice Smith.

Many of the problems in graph theory can be solved using graph traversal algorithms, such as Dijkstra's algorithm, Kruskal's algorithm, the nearest neighbour algorithm, and Prim's algorithm. These can be seen as extensions of the tree-search algorithms.

In an informed search, a heuristic that is specific to the problem is used as a guide. A good heuristic will make an informed search dramatically out-perform any uninformed search.

There are few prominent informed list-search algorithms. A possible member of that category is a hash table with a hashing function that is a heuristic based on the problem at hand. Most informed search algorithms explore trees [citation needed]. These include best-first search, and A*. Like the uninformed algorithms, they can be extended to work for graphs as well.

In games such as chess, programs typically maintain and recalculate as needed, a game tree of all possible moves by both players for the current position, and the resulting board configurations, and we can search this tree to find an effective playing strategy. This type of problem has the unique characteristic that we must account for any possible move our opponent might make. Game-playing computer programs, as well as other forms of artificial intelligence like machine planning, often use search algorithms like the minimax algorithm, search tree pruning, and alpha-beta pruning.

Constraint satisfaction

This is a type of search which solves constraint satisfaction problems rather than looking for a data value. The solution sought is a set of values assigned to a set of variables. Because the variables can be processed in any order, the usual tree search algorithms are not suitable. Methods of solving constraint problems include combinatorial search and backtracking, both of which take advantage of the freedom associated with constraint problems. Common tricks or techniques involved in backtracking is 'constraint propagation', which is a general form of 'forward checking'. Other local search algorithms, such as generic algorithm, which minimize the conflicts, may also be practical.

In the minmax algorithm, one takes first all the minimum values, then -- from them -- takes the maximum value. Naturally, it's the reverse for the maxmin algorithm.

Other types

See also

References