GermaNet

From Wikipedia, the free encyclopedia
Jump to: navigation, search

GermaNet is a lexical-semantic net for the German language that relates nouns, verbs, and adjectives semantically by grouping lexical units that express the same concept into synsets and by defining semantic relations between these synsets.[1] GermaNet has much in common with the English WordNet and can be viewed as an on-line thesaurus or a light-weight ontology. GermaNet has been developed and maintained within various projects at the research group for General and Computational Linguistics, University of Tübingen since 1997. It has been integrated into the EuroWordNet, a multilingual lexical-semantic database.[2]

Database[edit]

Contents[edit]

GermaNet partitions the lexical space into a set of concepts that are interlinked by semantic relations. A semantic concept is modeled by a synset. A synset is a set of words (called lexical units) where all the words are taken to have (almost) the same meaning. Thus a synset is a set-representation of the semantic relation of synonymy, which means that it consists of a list of lexical units and a definition (paraphrase). The lexical units in turn have frames (which specify syntactic valence) and examples of their use.[3] Just as in WordNet, for each word category the semantic space is divided into a number of semantic fields closely related to major nodes in the semantic network: Ort, or "location", Körper, or "body", etc.[2]

The following is an up-to-date statistics of GermaNet's version 6.0 contents (release April 2011):

  • Number of synsets: 69594
    • Of which adjectives: 5991
    • Of which nouns: 53753
    • Of which verbs: 9850
  • Number of lexical units: 93407
    • Of which adjectives: 8582
    • Of which nouns: 71844
    • Of which verbs: 12981 [2]

Format[edit]

All GermaNet data is stored in a relational PostgreSQL 5 database. The database model follows the internal structure of GermaNet: there are tables to store synsets, lexical units, conceptual and lexical relations, etc.[3] The distribution format of all GermaNet data is XML. The two types of files, one for synsets and the other for relations, represent all data that is available in the GermaNet database.

Interfaces[edit]

There are several Application Programming Interfaces (API) available for Java[4] and for Perl. These APIs are distributed freely and provide easy access to all information in various versions of GermaNet.

Licenses[edit]

GermaNet 6.0 (released April 2011) can be distributed under one of the following types of license agreements: Academic Research Agreement, Research and Development Agreement, or Commercial Agreement. GermaNet is free for academic use.

Applications[edit]

GermaNet has been used for a variety of applications, including semantic analysis, shallow recognition of implicit document structure, compound analysis;[5] for analyzing selectional preferences,[6] for word sense disambiguation,[7] etc.

See also[edit]

References[edit]

  1. ^ Petra Storjohann (23 June 2010). Lexical-semantic relations: theoretical and practical perspectives. John Benjamins Publishing Company. pp. 165–. ISBN 978-90-272-3138-3. Retrieved 16 November 2011. 
  2. ^ a b c GermaNet homepage
  3. ^ a b V. Henrich, E. Hinrichs. 2010. GernEdiT - The GermaNet Editing Tool. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation.
  4. ^ GermaNet APIs in Java
  5. ^ Manuela Kunze and Dietmar Rösner. 2004. Issues in Exploiting GermaNet as a Resource in Real Applications.
  6. ^ Sabine Schulte im Walde, 2004. GermaNet Synsets as Selectional Preferences in Semantic Verb Clustering.
  7. ^ Saito et al., 2002. Evaluation of GermanNet: Problems Using GermaNet for Automatic Word Sense Disambiguation.