|This article needs additional citations for verification. (October 2013)|
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.
A decision tree is a flowchart-like structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label (decision taken after computing all attributes). A path from root to leaf represents classification rules.
In decision analysis a decision tree and the closely related influence diagram is used as a visual and analytical decision support tool, where the expected values (or expected utility) of competing alternatives are calculated.
A decision tree consists of 3 types of nodes:
- Decision nodes - commonly represented by squares
- Chance nodes - represented by circles
- End nodes - represented by triangles
Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. If in practice decisions have to be taken online with no recall under incomplete knowledge, a decision tree should be paralleled by a probability model as a best choice model or online selection model algorithm. Another use of decision trees is as a descriptive means for calculating conditional probabilities.
Decision trees, influence diagrams, utility functions, and other decision analysis tools and methods are taught to undergraduate students in schools of business, health economics, and public health, and are examples of operations research or management science methods.
Decision tree building blocks
Decision tree elements
Drawn from left to right, a decision tree has only burst nodes (splitting paths) but no sink nodes (converging paths). Therefore, used manually, they can grow very big and are then often hard to draw fully by hand. Traditionally, decision trees have been created manually - as the aside example shows - although increasingly, specialized software is employed.
Decision tree using flow chart symbols
Commonly a decision tree is drawn using flow chart symbols as it is easier for many to read and understand.
The basic interpretation in this situation is that the company prefers B's risk and payoffs under realistic risk preference coefficients (greater than $400K—in that range of risk aversion, the company would need to model a third strategy, "Neither A nor B").
Decision trees can be used to optimize an investment portfolio. The following example shows a portfolio of 7 investment options (projects). The organization has $10,000,000 available for the total investment. Bold lines mark the best selection 1, 3, 5, 6, and 7, which will cost $9,750,000 and create a payoff of 16,175,000. All other combinations would either exceed the budget or yield a lower payoff.
A decision tree can be represented more compactly as an influence diagram, focusing attention on the issues and relationships between events.
The squares represent decisions, the ovals represent action, and the diamond represents results.
Advantages and disadvantages
Amongst decision support tools, decision trees (and influence diagrams) have several advantages. Decision trees:
- Are simple to understand and interpret. People are able to understand decision tree models after a brief explanation.
- Have value even with little hard data. Important insights can be generated based on experts describing a situation (its alternatives, probabilities, and costs) and their preferences for outcomes.
- Possible scenarios can be added
- Worst, best and expected values can be determined for different scenarios
- Use a white box model. If a given result is provided by a model.
- Can be combined with other decision techniques. The following example uses Net Present Value calculations, PERT 3-point estimations (decision #1) and a linear distribution of expected outcomes (decision #2):
Disadvantages of decision trees:
- For data including categorical variables with different number of levels, information gain in decision trees are biased in favor of those attributes with more levels.
- Calculations can get very complex particularly if many values are uncertain and/or if many outcomes are linked.
- Y. Yuan and M.J. Shaw, Induction of fuzzy decision trees. Fuzzy Sets and Systems 69 (1995), pp. 125–139
- Deng,H.; Runger, G.; Tuv, E. (2011). "Bias of importance measures for multi-valued attributes and solutions". Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN).
- Cha, Sung-Hyuk; Tappert, Charles C (2009). "A Genetic Algorithm for Constructing Compact Binary Decision Trees". Journal of Pattern Recognition Research 4 (1): 1–13.
|Wikimedia Commons has media related to decision diagrams.|