Parse tree

A concrete syntax tree or parse tree or parsing tree^[1] is an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. Parse trees are usually constructed according to one of two competing relations, either in terms of the constituency relation of constituency grammars (= phrase structure grammars) or in terms of the dependency relation of dependency grammars. Parse trees are distinct from abstract syntax trees (also known simply as syntax trees), in that their structure and elements more concretely reflect the syntax of the input language. Parse trees may be generated for sentences in natural languages (see natural language processing), as well as during processing of computer languages, such as programming languages.

Constituency-based parse trees

The constituency-based parse trees of constituency grammars (= phrase structure grammars) distinguish between terminal and non-terminal nodes. The interior nodes are labeled by non-terminal categories of the grammar, while the leaf nodes are labeled by terminal categories. The image below represents a constituency-based parse tree; it shows the syntactic structure of the English sentence John hit the ball:

This parse tree is simplified; for more information, see X-bar theory. The parse tree is the entire structure, starting from S and ending in each of the leaf nodes (John, hit, the, ball). The following abbreviations are used in the tree:

S for sentence, the top-level structure in this example
NP for noun phrase. The first (leftmost) NP, a single noun "John", serves as the subject of the sentence. The second one is the object of the sentence.
VP for verb phrase, which serves as the predicate
V for verb. In this case, it's a transitive verb hit.
D for determiner, in this instance the definite article "the"
N for noun

Each node in the tree is either a root node, a branch node, or a leaf node. S is the root node, NP and VP are branch nodes, and John, hit, the, and ball are all leaf nodes. The leaves are the lexical tokens of the sentence.^[2] A node can also be referred to as parent node or a child node. A parent node is one that has at least one other node linked by a branch under it. In the example, S is a parent of both NP and VP. A child node is one that has at least one node directly above it to which it is linked by a branch of the tree. From the example, hit is a child node of V. The terms mother and daughter are also sometimes used for this relationship.

Dependency-based parse trees

The dependency-based parse trees of dependency grammars^[3] see all nodes as terminal, which means they do not acknowledge the distinction between terminal and non-terminal categories. They are simpler on average than constituency-based parse trees because they contain many fewer nodes. The dependency-based parse tree for the example sentence above is as follows:

This parse tree lacks the phrasal categories (S, VP, and NP) seen in the constituency-based counterpart above. Like the constituency-based tree however, constituent structure is acknowledged. Any complete subtree of the tree is a constituent. Thus this dependency-based parse tree acknowledges the subject noun John and the object noun phrase the ball as constituents just like the constituency-based parse tree does.

The constituency vs. dependency distinction is far-reaching. Whether the additional syntactic structure associated with constituency-based parse trees is necessary or beneficial is a matter of debate.

Notes

^ See Chiswell and Hodges 2007: 34.
^ See Alfred et al. 2007.
^ See for example Ágel et al. 2003/2006.

References

Ágel, V., Ludwig Eichinger, Hans-Werner Eroms, Peter Hellwig, Hans Heringer, and Hennig Lobin (eds.) 2003/6. Dependency and valency: An international handbook of contemporary research. Berlin: Walter de Gruyter.
Chiswell, Ian and Wilfrid Hodges 2007. Mathematical logic. Oxford: Oxford University Press.
Aho, Alfred et al. 2007. Compilers: Principles, techniques, & tools. Boston: Pearson/Addison Wesley.

External links

Syntax Tree Editor
Linguistic Tree Constructor
phpSyntaxTree – Online parse tree drawing site
phpSyntaxTree (Unicode) – Online parse tree drawing site (improved version that supports Unicode)
Qtree – LaTeX package for drawing parse trees
TreeForm Syntax Tree Drawing Software
rSyntaxTree Enhanced version of phpSyntaxTree in Ruby with Unicode and Vectorized graphics

[1] See Chiswell and Hodges 2007: 34.

[2] See Alfred et al. 2007.

[3] See for example Ágel et al. 2003/2006.

[1]

[2]

[3]

Constituency-based parse trees

Dependency-based parse trees

Notes

References

See also

External links