# WordNet

WordNet is a database of words in the English language. Unlike a dictionary that's organized alphabetically, WordNet is organized by concept and meaning. In fact, traditional dictionaries were created for humans but what's needed is a lexical resource more suited for computers. This is where WordNet becomes useful.

WordNet is a network of words linked by lexical and semantic relations. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms, called synsets, each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the WordNet browser.

## Discussion

• What's the distinction between WordNet and a thesaurus?

A thesaurus provides similar words (synonyms) and opposites (antonyms). WordNet does much more than this. Via synsets, WordNet brings together specific word senses. As a result, words that are found in close proximity to one another in the network are semantically disambiguated.

A synset is also linked to other synsets by semantic relations. Such relations are missing in a thesaurus. These relations are based on concepts and therefore give us valuable information about words. For example, the verbs (communicate, talk, whisper) are all about talking but the manner goes from general to specific. A similar example with nouns would be (furniture, bed, bunkbed). An example of a part-whole relation is (leg, chair). These sorts of relations are captured in WordNet.

The nodes of WordNet are synsets. Links between two nodes are either conceptual-semantic (bird, feather) or lexical (feather, feathery). Lexical links subsume conceptual-semantic links.

• Could you explain WordNet's synsets with an example?

Consider the word 'bike'. It has multiple meanings. It could be a motorcycle (noun), a bicycle (noun) or bicycle (verb). WordNet represents these as three synsets with unique names: motorcycle.n.01, bicycle.n.01 and bicycle.v.01.

Each synset has an array of lemma names that share the same concept. Thus, motorcycle.n.01 has the words 'motorcycle' and 'bike'. The synset also has a definition, also called gloss. It now becomes clear that the word 'bike' must be present in all the three synsets, each representing a different concept or meaning.

It's also possible to move from one synset to another by certain relations. For example, motor_vehicle.n.01 is a more general concept of motorcycle.n.01 whereas minibike.n.01 and trail_bike.n.01 are more specific concepts of motorcycle.n.01. Thus, synsets are linked by relations making WordNet a network of conceptually related words.

• What is WordNet used for?

WordNet is typically used by linguistics, psychologists and those working in the fields of AI and NLP. Among its many applications are word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, and machine translation.

WordNet can be used as a thesaurus except that words are organized by concept and semantic/lexical relations. In NLP, WordNet has become a useful tool for word sense disambiguation. When a word has multiple senses, WordNet can help in identifying the correct sense. WordNet's symbolic approach complements statistical approaches.

Measuring similarity between words is another application. Different algorithms exist to measure word similarity. Such a similarity measure can be used in spelling checking or question answering. However, WordNet is limited to noun-noun or verb-verb similarity. We can't compare nouns and verbs, or use other parts of speech.

Where neural networks are used for NLP work, word embeddings (low-dimensional vectors) are used. However, word embeddings don't discriminate different senses. WordNet has been applied to create sense embeddings.

• What are major lexical relations captured in WordNet?

Major lexical relations include the following:

• Synonymy: Synonyms are words that have similar meanings. Often context determines which synonym is best suited.
• Polysemy: Polysemous words have more than one sense. The word bank can mean river bank, where money is stored, a building, or institution. Polysemy is associated with the terms homonymy and metonymy.
• Hyponymy/Hypernymy: Is-a relation. Robin is a hyponym of bird since robin is a type of bird. Likewise, bird is a hypernym of robin. Thus, hypernyms are synsets that are more general whereas hyponyms are more specific.
• Meronymy/Holonymy: Part-whole relation. Beak is meronym of bird since beak is part of a bird's anatomy. Likewise, bird is holonym of beak. WordNet identifies three types of relations: components (leg, table), constituents (oxygen, water), and members (parent, family).
• Antonymy: Lexical opposites such as (large, small).
• Troponymy: Applicable for verbs. For example, whisper is a troponym of talk since whisper elaborates on the manner of talking.
• What are some limitations of WordNet?

WordNet doesn't include syntactic information, although later work showed that at least for verbs there's correlation between semantic makeup and syntactic behaviour.

Semantic relations are more suited to concrete concepts, such as tree is a hypernym of conifer. It's less suited to abstract concepts such as fear or happiness where it's hard to identify hyponym/hypernym relations. Some relations may also be language specific and therefore can make different wordnets less interoperable.

WordNet's senses are sometimes too fine-grained for automatic sense disambiguation. One possible solution is to group related senses.

WordNet doesn't include information about the etymology. Thus, word origins and how they've evolved over time are not captured. Offensive words are also included and it's left to applications to decide what's offensive since meanings change over time. Pronunciation is missing. There's limited information about usage. WordNet covers most of everyday English but doesn't include domain-specific terminology.

WordNet was created in the mid-1980s when digital corpora were hard to come by. WordNet was assembled by the intuition of lexicographers rather than by a corpus-induced dictionary.

## Milestones

1928

Murray’s Oxford English Dictionary (OED) is compiled "on historical principles". By focusing on historical evidence, OED, like other standard dictionaries, neglects questions concerning the synchronic organization of lexical knowledge.

1969

Collins and Quillian propose a hierarchical semantic memory model for storing information in computer systems. They hypothesize that human memory is in fact organized in this manner. They test their hypothesis by measuring retrieval times. For example, if a person is asked if a canary can fly, the actual retrieval might involve inference from a memory that contains "canary is a bird" and "birds can fly". This important work goes on to influence the creation of WordNet almost two decades later.

1976

Miller and Johnson-Laird propose psycholexicology, a study of the lexical component of language, which is about words and vocabulary of a language.

1985

Some psychologists and linguists at Princeton University start developing a lexical database. While a dictionary helps us search for words alphabetically, a lexical database allows us to search based on concepts. This marks the beginning of Princeton WordNet. We can say that it's a dictionary based on psycholinguistic principles.

1991

WordNet 1.0 is released.

1996

EuroWordNet is started as an EU project covering languages Dutch, Spanish and Italian. It's inspired by and is designed to link to the Princeton WordNet. In 1997, more languages are added: German, French, Czech and Estonian. The project is completed towards the end of 1999. One novel feature is the Inter-Lingual-Index (ILI) that defines equivalence relations between synsets in different languages. In later years, this work is extended by other projects: EUROTERM, BALKANET, and MEANING. By 2006, it's noted that databases exist for 35 languages globally.

Mar
2005

WordNet 2.1 is released. There's support for UNIX-like systems and Windows. WordNet 2.1 contains almost 118,000 synsets, comprising more than 81,000 noun synsets, 13,600 verb synsets, 19,000 adjective synsets, and 3,600 adverb synsets.

Dec
2006

WordNet 3.0 is released. This release has 117,798 nouns, 11,529 verbs, 22,479 adjectives, and 4,481 adverbs. The average noun has 1.23 senses, and the average verb has 2.16 senses.

Jun
2011

WordNet 3.1 is released. It's available only online. It's possible to download only the database and use the installation from 3.0. This version contains 155,327 words organized in 175,979 synsets for a total of 207,016 word-sense pairs. It's compressed size is 12MB.

Jun
2018

Under the guidance of Global WordNet Association, the English WordNet is created on GitHub as a fork of the Princeton WordNet 3.1. Annual updates of this resource happen in April 2019 and April 2020.

## Sample Code

• # Source: https://pythonprogramming.net/wordnet-nltk-tutorial/
# Accessed 2020-08-02

from nltk.corpus import wordnet

synonyms = []
antonyms = []

for syn in wordnet.synsets("good"):
for l in syn.lemmas():
synonyms.append(l.name())
if l.antonyms():
antonyms.append(l.antonyms()[0].name())

print(set(synonyms))
print(set(antonyms))

w1 = wordnet.synset('ship.n.01')
w2 = wordnet.synset('boat.n.01')
print(w1.wup_similarity(w2))

Author
No. of Edits
No. of Chats
DevCoins
13
0
1042
4
1
1035
1407
Words
1
Likes
9302
Hits

## Cite As

Devopedia. 2020. "WordNet." Version 17, August 3. Accessed 2022-09-22. https://devopedia.org/wordnet
Contributed by
2 authors

Last updated on
2020-08-03 12:29:32
• Site Map