Semantic lexicon

A semantic lexicon is a digital dictionary of words labeled with semantic classes so associations can be drawn between words that have not previously been encountered.^[1] Semantic lexicons are built upon semantic networks, which represent the semantic relations between words. The difference between a semantic lexicon and a semantic network is that a semantic lexicon has definitions for each word, or a "gloss".^[2]

Structure

Semantic lexicons are made up of lexical entries. These entries are not orthographic, but semantic, eliminating issues of homonymy and polysemy. These lexical entries are interconnected with semantic relations, such as hyperonymy, hyponymy, meronymy, or troponymy. Synonymous entries are grouped together in what the Princeton WordNet calls "synsets"^[2] Most semantic lexicons are made up of four different "sub-nets":^[2] nouns, verbs, adjectives, and adverbs, though some researchers have taken steps to add an "artificial node" interconnecting the sub-nets.^[3]

Nouns

Nouns are ordered into a taxonomy, structured into a hierarchy where the broadest and most encompassing noun is located at the top, such as "thing", with the nouns becoming more and more specific the further they are from the top. The very top noun in a semantic lexicon is called a unique beginner.^[4] The most specific nouns (those that do not have any subordinates), are terminal nodes.^[3]

Semantic lexicons also distinguish between types, where a type of something has characteristics of a thing such as a Rhodesian Ridgeback being a type of dog, and instances, where something is an example of said thing, such as Dave Grohl is an instance of a musician. Instances are always terminal nodes because they are solitary and don’t have other words or ontological categories belonging to them.^[2]

Semantic lexicons also address meronymy,^[5] which is a “part-to-whole” relationship, such as keys are part of a laptop. The necessary attributes that define a specific entry are also necessarily present in that entry’s hyponym. So, if a computer has keys, and a laptop is a type of computer, then a laptop must have keys. However, there are many instances where this distinction can become vague. A good example of this is the item chair. Most would define a chair as having legs and a seat (as in the part one sits on). However, there are some artistic or modern chairs that do not have legs at all. Beanbags also do not have legs, but few would argue that they aren't chairs. Questions like this are the core questions that drive research and work in the fields of taxonomy and ontology.

Verbs

Verb synsets are arranged much like their noun counterparts: the more general and encompassing verbs are near the top of the hierarchy while troponyms (verbs that describe a more specific way of doing something) are grouped beneath. Verb specificity moves along a vector, with the verbs becoming more and more specific in reference to a certain quality.^[2] For example. The set "walk / run / sprint" becomes more specific in terms of the speed, and "dislike / hate / abhor" becomes more specific in terms of the intensity of the emotion.

The ontological groupings and separations of verbs is far more arguable than their noun counterparts. It is widely accepted that a dog is a type of animal and that a stool is a type of chair, but it can be argued that abhor is on the same emotional plane as hate (that they are synonyms and not super/subordinates). It can also be argued that love and adore are synonyms, or that one is more specific than the other. Thus, the relations between verbs are not as agreed-upon as that of nouns.

Another attribute of verb synset relations is that there are also ordered into verb pairs. In these pairs, one verb necessarily entails the other in the way that massacre entails kill, and know entails believe.^[2] These verb pairs can be troponyms and their superordinates, as is the case in the first example, or they can be in completely different ontological categories, as in the case in the second example.

Adjectives

Adjective synset relations are very similar to verb synset relations. They are not quite as neatly hierarchical as the noun synset relations, and they have fewer tiers and more terminal nodes. However, there are generally less terminal nodes per ontological category in adjective synset relations than that of verbs. Adjectives in semantic lexicons are organized in word pairs as well, with the difference being that their word pairs are antonyms instead of entailments. More generic polar adjectives such as hot and cold, or happy and sad are paired. Then other adjectives that are semantically similar are linked to each of these words. Hot is linked to warm, heated, sizzling, and sweltering, while cold is linked to cool, chilly, freezing, and nippy. These semantically similar adjectives are considered indirect antonyms^[2] to the opposite polar adjective (i.e. nippy is an indirect antonym to hot). Adjectives that are derived from a verb or a noun are also directly linked to said verb or noun across sub-nets. For example, enjoyable is linked to the semantically similar adjectives agreeable, and pleasant, as well as to its origin verb, enjoy.

Adverbs

There are very few adverbs accounted for in semantic lexicons. This is because most adverbs are taken directly from their adjective counterparts, in both meaning and form, and changed only morphologically (i.e. happily is derived from happy, and luckily is derived from lucky, which is derived from luck). The only adverbs that are accounted for specifically are ones without these connections, such as really, mostly, and hardly.^[2]

Challenges facing semantic lexicons

The effects of the Princeton WordNet project extend far past English, though most research in the field revolves around the English language. Creating a semantic lexicon for other languages has proved to be very useful for Natural Language Processing applications. One of the main focuses of research in semantic lexicons is linking lexicons of different languages to aid in machine translation. The most common approach is to attempt to create a shared ontology that serves as a “middleman” of sorts between semantic lexicons of two different languages.^[6] This is an extremely challenging and as-of-yet unsolved issue in the Machine Translation field. One issue arises from the fact that no two languages are word-for-word translations of each other. That is, every language has some sort of structural or syntactic difference from every other. In addition, languages often have words that don’t translate easily into other languages, and certainly not with an exact word-to-word match. Proposals have been made to create a set framework for wordnets. Research has shown that every known human language has some sort of concept resembling synonymy, hyponymy, meronymy, and antonymy. However, every idea so far proposed has been met with criticism for using a pattern that works best for English and less for other languages.^[6]

Another obstacle in the field is that no solid guidelines exist for semantic lexicon framework and contents. Each lexicon project in each different language has had a slightly (or not so slightly) different approach to their wordnet. There is not even an agreed-upon definition of what a “word” is. Orthographically, they are defined as a string of letters with spaces on either side, but semantically it becomes a very debated subject. For example, though it is not difficult to define dog or rod as words, but what about guard dog or lightning rod? The latter two examples would be considered orthographically separate words, though semantically they make up one concept: one is a type of dog and one is a type of rod. In addition to these confusions, wordnets are also idiosyncratic, in that they do not consistently label items. They are redundant, in that they often have several words assigned to each meaning (synsets). They are also open-ended, in that they often focus on and extend into terminology and domain-specific vocabulary.^[6]

Other names

wordnet
computational lexicon

List of semantic lexicons

References

^ Theng, Yin-Leng (2009). Handbook of Research on Digital Libraries: Design, Development, and Impact. University of Michigan: Information Science Reference. ISBN 9781599048796.
^ ^a ^b ^c ^d ^e ^f ^g ^h "About WordNet".
^ ^a ^b Lemnitzer, L. "Enriching GermaNet: a case study of lexical acquisition". Seminar für Sprachwissenschaft, Universitat Tubingen.
^ Boyd-Graber, J. (2006). "Adding Dense, Weighted Connections to WordNet". Proceedings of the Third International Wordnet Conference.
^ Hinrichs, E. (December 2012). "Using part-whole relations for automatic deduction of compound-international relations in GermaNet". International Journal on Semantic Web and Information Systems. 3.
^ ^a ^b ^c Fellbaum, C. (May 2012). "Challenges for a Multilingual Wordnet". Language Resources and Evaluation. 46 (2): 313–326. doi:10.1007/s10579-012-9186-z. S2CID 254379442.

[1] Theng, Yin-Leng (2009). Handbook of Research on Digital Libraries: Design, Development, and Impact. University of Michigan: Information Science Reference. ISBN 9781599048796.

[:0-2] ^ ^a ^b ^c ^d ^e ^f ^g ^h "About WordNet".

[:1-3] Lemnitzer, L. "Enriching GermaNet: a case study of lexical acquisition". Seminar für Sprachwissenschaft, Universitat Tubingen.

[4] Boyd-Graber, J. (2006). "Adding Dense, Weighted Connections to WordNet". Proceedings of the Third International Wordnet Conference.

[5] Hinrichs, E. (December 2012). "Using part-whole relations for automatic deduction of compound-international relations in GermaNet". International Journal on Semantic Web and Information Systems. 3.

[:2-6] Fellbaum, C. (May 2012). "Challenges for a Multilingual Wordnet". Language Resources and Evaluation. 46 (2): 313–326. doi:10.1007/s10579-012-9186-z. S2CID 254379442.

[1]

[2]

[3]

[4]

[5]

[6]