A concrete syntax tree or parse tree or parsing tree or derivation tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. Parse trees are usually constructed according to one of two competing relations, either in terms of the constituency relation of constituency grammars (= phrase structure grammars) or in terms of the dependency relation of dependency grammars. Parse trees are distinct from abstract syntax trees (also known simply as syntax trees), in that their structure and elements more concretely reflect the syntax of the input language. Parse trees may be generated for sentences in natural languages (see natural language processing), as well as during processing of computer languages, such as programming languages.
Notes on terminology
The term parse tree itself is used primarily within the field of computational linguistics. Theoretical syntax tends to prefer the term syntax tree over parse tree. When diagramming sentences in grammar school, one refers to sentence diagrams. The sentence diagrams that one learns in middle school (Reed-Kellogg diagrams) are, however, much different from the parse trees of computational linguistics and syntax trees of theoretical linguistics.
Constituency-based parse trees
The constituency-based parse trees of constituency grammars (= phrase structure grammars) distinguish between terminal and non-terminal nodes. The interior nodes are labeled by non-terminal categories of the grammar, while the leaf nodes are labeled by terminal categories. The image below represents a constituency-based parse tree; it shows the syntactic structure of the English sentence John hit the ball:
The parse tree is the entire structure, starting from S and ending in each of the leaf nodes (John, hit, the, ball). The following abbreviations are used in the tree:
- S for sentence, the top-level structure in this example
- N for noun
Each node in the tree is either a root node, a branch node, or a leaf node. A root node is a node that doesn't have any branches on top of it. Within a sentence, there is only ever one root node. A branch node is a mother node that connects to two or more daughter nodes. A leaf node, however; is a terminal node that does not dominate other nodes in the tree. S is the root node, NP and VP are branch nodes, and John (N), hit (V), the (D), and ball (N) are all leaf nodes. The leaves are the lexical tokens of the sentence. A node can also be referred to as parent node or a child node. A parent node is one that has at least one other node linked by a branch under it. In the example, S is a parent of both N and VP. A child node is one that has at least one node directly above it to which it is linked by a branch of a tree. From the example, hit is a child node of V. The terms mother and daughter are also sometimes used for this relationship.
Dependency-based parse trees
The dependency-based parse trees of dependency grammars see all nodes as terminal, which means they do not acknowledge the distinction between terminal and non-terminal categories. They are simpler on average than constituency-based parse trees because they contain many fewer nodes. The dependency-based parse tree for the example sentence above is as follows:
This parse tree lacks the phrasal categories (S, VP, and NP) seen in the constituency-based counterpart above. Like the constituency-based tree however, constituent structure is acknowledged. Any complete sub-tree of the tree is a constituent. Thus this dependency-based parse tree acknowledges the subject noun John and the object noun phrase the ball as constituents just like the constituency-based parse tree does.
The constituency vs. dependency distinction is far-reaching. Whether the additional syntactic structure associated with constituency-based parse trees is necessary or beneficial is a matter of debate.
- See Chiswell and Hodges 2007: 34.
- See Carnie (2013:118ff.) for an introduction to the basic concepts of syntax trees (e.g. root node, terminal node, non-terminal node, etc.).
- See Alfred et al. 2007.
- See for example Ágel et al. 2003/2006.
- Ágel, V., Ludwig Eichinger, Hans-Werner Eroms, Peter Hellwig, Hans Heringer, and Hennig Lobin (eds.) 2003/6. Dependency and valency: An international handbook of contemporary research. Berlin: Walter de Gruyter.
- Carnie, A. 2013. Syntax: A generative introduction, 3rd edition. Malden, MA: Wiley-Blackwell.
- Chiswell, Ian and Wilfrid Hodges 2007. Mathematical logic. Oxford: Oxford University Press.
- Aho, Alfred et al. 2007. Compilers: Principles, techniques, & tools. Boston: Pearson/Addison Wesley.
- Syntax Tree Editor
- Linguistic Tree Constructor
- phpSyntaxTree – Online parse tree drawing site
- phpSyntaxTree (Unicode) – Online parse tree drawing site (improved version that supports Unicode)
- Qtree – LaTeX package for drawing parse trees
- TreeForm Syntax Tree Drawing Software
- rSyntaxTree Enhanced version of phpSyntaxTree in Ruby with Unicode and Vectorized graphics
- Visual Introduction to Parse Trees Introduction and Transformation
- OpenCourseOnline Dependency Parse Introduction (Christoper Manning)