WordNet

WordNet Browser. Source: Wikipedia 2020.
WordNet Browser. Source: Wikipedia 2020.

WordNet is a database of words in the English language. Unlike a dictionary that's organized alphabetically, WordNet is organized by concept and meaning. In fact, traditional dictionaries were created for humans but what's needed is a lexical resource more suited for computers. This is where WordNet becomes useful.

WordNet is a network of words linked by lexical and semantic relations. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms, called synsets, each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the WordNet browser.

WordNet is freely and publicly available for download.

Discussion

  • What's the distinction between WordNet and a thesaurus?
    Lexical Relations. Source: Abd-Elwasaa 2016.
    Lexical Relations. Source: Abd-Elwasaa 2016.

    A thesaurus provides similar words (synonyms) and opposites (antonyms). WordNet does much more than this. Via synsets, WordNet brings together specific word senses. As a result, words that are found in close proximity to one another in the network are semantically disambiguated.

    A synset is also linked to other synsets by semantic relations. Such relations are missing in a thesaurus. These relations are based on concepts and therefore give us valuable information about words. For example, the verbs (communicate, talk, whisper) are all about talking but the manner goes from general to specific. A similar example with nouns would be (furniture, bed, bunkbed). An example of a part-whole relation is (leg, chair). These sorts of relations are captured in WordNet.

    The nodes of WordNet are synsets. Links between two nodes are either conceptual-semantic (bird, feather) or lexical (feather, feathery). Lexical links subsume conceptual-semantic links.

  • Could you explain WordNet's synsets with an example?
    Synsets of the word 'bike'. Source: Educative 2020.
    Synsets of the word 'bike'. Source: Educative 2020.

    Consider the word 'bike'. It has multiple meanings. It could be a motorcycle (noun), a bicycle (noun) or bicycle (verb). WordNet represents these as three synsets with unique names: motorcycle.n.01, bicycle.n.01 and bicycle.v.01.

    Each synset has an array of lemma names that share the same concept. Thus, motorcycle.n.01 has the words 'motorcycle' and 'bike'. The synset also has a definition, also called gloss. It now becomes clear that the word 'bike' must be present in all the three synsets, each representing a different concept or meaning.

    It's also possible to move from one synset to another by certain relations. For example, motor_vehicle.n.01 is a more general concept of motorcycle.n.01 whereas minibike.n.01 and trail_bike.n.01 are more specific concepts of motorcycle.n.01. Thus, synsets are linked by relations making WordNet a network of conceptually related words.

  • What is WordNet used for?
    Path lengths can be used to compute word similarity. Source: Jurafsky and Martin 2019, fig. C.5.
    Path lengths can be used to compute word similarity. Source: Jurafsky and Martin 2019, fig. C.5.

    WordNet is typically used by linguistics, psychologists and those working in the fields of AI and NLP. Among its many applications are word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, and machine translation.

    WordNet can be used as a thesaurus except that words are organized by concept and semantic/lexical relations. In NLP, WordNet has become a useful tool for word sense disambiguation. When a word has multiple senses, WordNet can help in identifying the correct sense. WordNet's symbolic approach complements statistical approaches.

    Measuring similarity between words is another application. Different algorithms exist to measure word similarity. Such a similarity measure can be used in spelling checking or question answering. However, WordNet is limited to noun-noun or verb-verb similarity. We can't compare nouns and verbs, or use other parts of speech.

    Where neural networks are used for NLP work, word embeddings (low-dimensional vectors) are used. However, word embeddings don't discriminate different senses. WordNet has been applied to create sense embeddings.

  • What are major lexical relations captured in WordNet?
    Semantic relations in WordNet. Source: Miller 1995, table 1.
    Semantic relations in WordNet. Source: Miller 1995, table 1.

    Major lexical relations include the following:

    • Synonymy: Synonyms are words that have similar meanings. Often context determines which synonym is best suited.
    • Polysemy: Polysemous words have more than one sense. The word bank can mean river bank, where money is stored, a building, or institution. Polysemy is associated with the terms homonymy and metonymy.
    • Hyponymy/Hypernymy: Is-a relation. Robin is a hyponym of bird since robin is a type of bird. Likewise, bird is a hypernym of robin. Thus, hypernyms are synsets that are more general whereas hyponyms are more specific.
    • Meronymy/Holonymy: Part-whole relation. Beak is meronym of bird since beak is part of a bird's anatomy. Likewise, bird is holonym of beak. WordNet identifies three types of relations: components (leg, table), constituents (oxygen, water), and members (parent, family).
    • Antonymy: Lexical opposites such as (large, small).
    • Troponymy: Applicable for verbs. For example, whisper is a troponym of talk since whisper elaborates on the manner of talking.
  • What are some limitations of WordNet?

    WordNet doesn't include syntactic information, although later work showed that at least for verbs there's correlation between semantic makeup and syntactic behaviour.

    Semantic relations are more suited to concrete concepts, such as tree is a hypernym of conifer. It's less suited to abstract concepts such as fear or happiness where it's hard to identify hyponym/hypernym relations. Some relations may also be language specific and therefore can make different wordnets less interoperable.

    WordNet's senses are sometimes too fine-grained for automatic sense disambiguation. One possible solution is to group related senses.

    WordNet doesn't include information about the etymology. Thus, word origins and how they've evolved over time are not captured. Offensive words are also included and it's left to applications to decide what's offensive since meanings change over time. Pronunciation is missing. There's limited information about usage. WordNet covers most of everyday English but doesn't include domain-specific terminology.

    WordNet was created in the mid-1980s when digital corpora were hard to come by. WordNet was assembled by the intuition of lexicographers rather than by a corpus-induced dictionary.

Milestones

1928

Murray’s Oxford English Dictionary (OED) is compiled "on historical principles". By focusing on historical evidence, OED, like other standard dictionaries, neglects questions concerning the synchronic organization of lexical knowledge.

1969
Hypothetical memory structure of a 3-level hierarchy. Source: Collins and Quillian 1969, fig. 1.
Hypothetical memory structure of a 3-level hierarchy. Source: Collins and Quillian 1969, fig. 1.

Collins and Quillian propose a hierarchical semantic memory model for storing information in computer systems. They hypothesize that human memory is in fact organized in this manner. They test their hypothesis by measuring retrieval times. For example, if a person is asked if a canary can fly, the actual retrieval might involve inference from a memory that contains "canary is a bird" and "birds can fly". This important work goes on to influence the creation of WordNet almost two decades later.

1976

Miller and Johnson-Laird propose psycholexicology, a study of the lexical component of language, which is about words and vocabulary of a language.

1985

Some psychologists and linguists at Princeton University start developing a lexical database. While a dictionary helps us search for words alphabetically, a lexical database allows us to search based on concepts. This marks the beginning of Princeton WordNet. We can say that it's a dictionary based on psycholinguistic principles.

1991

WordNet 1.0 is released.

1996

EuroWordNet is started as an EU project covering languages Dutch, Spanish and Italian. It's inspired by and is designed to link to the Princeton WordNet. In 1997, more languages are added: German, French, Czech and Estonian. The project is completed towards the end of 1999. One novel feature is the Inter-Lingual-Index (ILI) that defines equivalence relations between synsets in different languages. In later years, this work is extended by other projects: EUROTERM, BALKANET, and MEANING. By 2006, it's noted that databases exist for 35 languages globally.

Mar
2005

WordNet 2.1 is released. There's support for UNIX-like systems and Windows. WordNet 2.1 contains almost 118,000 synsets, comprising more than 81,000 noun synsets, 13,600 verb synsets, 19,000 adjective synsets, and 3,600 adverb synsets.

Dec
2006
Average polysemy of words in WordNet 3.0. Source: WordNet Docs 2020.
Average polysemy of words in WordNet 3.0. Source: WordNet Docs 2020.

WordNet 3.0 is released. This release has 117,798 nouns, 11,529 verbs, 22,479 adjectives, and 4,481 adverbs. The average noun has 1.23 senses, and the average verb has 2.16 senses.

Jun
2011

WordNet 3.1 is released. It's available only online. It's possible to download only the database and use the installation from 3.0. This version contains 155,327 words organized in 175,979 synsets for a total of 207,016 word-sense pairs. It's compressed size is 12MB.

Jun
2018

Under the guidance of Global WordNet Association, the English WordNet is created on GitHub as a fork of the Princeton WordNet 3.1. Annual updates of this resource happen in April 2019 and April 2020.

Sample Code

  • # Source: https://pythonprogramming.net/wordnet-nltk-tutorial/
    # Accessed 2020-08-02
     
    from nltk.corpus import wordnet
     
    synonyms = []
    antonyms = []
     
    for syn in wordnet.synsets("good"):
        for l in syn.lemmas():
            synonyms.append(l.name())
            if l.antonyms():
                antonyms.append(l.antonyms()[0].name())
     
    print(set(synonyms))
    print(set(antonyms))
     
    w1 = wordnet.synset('ship.n.01')
    w2 = wordnet.synset('boat.n.01')
    print(w1.wup_similarity(w2))
     

References

  1. Abd-Elwasaa, Ahmed. 2016. "WORDNET: A Database of Lexical Relations." SlideShare, January 2. Accessed 2020-08-02.
  2. Camacho-Collados, Jose, and Mohammad Taher Pilehvar. 2018. "From Word to Sense Embeddings: A Survey on Vector Representations of Meaning." arXiv, v3, October 26. Accessed 2020-08-03.
  3. Collins, Allan M. and M. Ross Quillian. 1969. "Retrieval Time from Semantic Memory." J. Verbal Learning and Verbal Behavior, vol. 8, no. 2, pp. 240-247, April. Accessed 2020-08-03.
  4. Educative. 2020. "How to use WordNet in Python." Educative, Inc. Accessed 2020-08-03.
  5. Fellbaum, C. 2006. "WordNet(s)." In: Keith Brown (ed), Encyclopedia of Language & Linguistics, Second Edition, vol. 13, pp. 665-670. Oxford: Elsevier. Accessed 2020-08-02.
  6. Fellbaum, C. 2012. "WordNet." First Interdisciplinary Summer School on Ontological Analysis, Trento, Italy, July 16-20. Accessed 2020-08-03.
  7. Fellbaum, C. and G. A. Miller. 2006. "Whither WordNet?" Presentation, LREC. Accessed 2020-08-03.
  8. Global WordNet Association. 2020. "English WordNet: Releases." GitHub, April 17. Accessed 2020-08-03.
  9. Jurafsky, Daniel and James H. Martin. 2019. "WordNet: Word Relations,Senses, and Disambiguation." Chapter C in: Speech and Language Processing, Third Edition draft, October 16. Accessed 2020-08-03.
  10. Khoi, Nguyen. 2012. "WordNet Introduction." SlideShare, October 31. Accessed 2020-08-02.
  11. Miller, George A. 1995. "WordNet: A LexicalDatabase for English." Comm. of the ACM, vol. 38, no. 11, pp. 39-41, November. Accessed 2020-08-02.
  12. Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. 1993. "Introduction to WordNet." August. Accessed 2020-08-02.
  13. Princeton University. 2020a. "WordNet: A Lexical Database for English." Princeton University. Accessed 2020-08-02.
  14. Princeton University. 2020b. "Current Version." WordNet Download, Princeton University. Accessed 2020-08-02.
  15. Raj, Govind. 2013. "WordNet: A Lexical Knowledgebase." SlideShare, November 1. Accessed 2020-08-02.
  16. Vossen, P. 2002. "WordNet, EuroWordNet and Global WordNet." Revue française de linguistique appliquée, vol. vii(1), pp. 27-38. doi:10.3917/rfla.071.0027. Accessed 2020-08-03.
  17. Wikipedia. 2020. "WordNet." Wikipedia, July 27. Accessed 2020-08-02.
  18. WordNet Docs. 2020. "wnstats(7WN)." WordNet Docs, Princeton University. Accessed 2020-08-03.

Further Reading

  1. Fellbaum, Christiane (ed). 1998. "WordNet: An Electronic Lexical Database." Cambridge, MA: MIT Press. Accessed 2020-08-02.
  2. Loria, Steven. 2013. "Tutorial: What is WordNet? A Conceptual Introduction Using Python." September 30. Updated 2014-10-26. Accessed 2020-08-02.
  3. sentdex. 2015. "WordNet - Natural Language Processing With Python and NLTK (part 10)." sentdex, on YouTube, May 11. Accessed 2020-08-02.
  4. W3C. 2007. "WordNet RDF/OWL Files." Revision 1.3, W3C, January 10. Accessed 2020-08-02.
  5. Synthetic Intelligence Network. 2020. "WordNet Tutorial." Tutorial, Synthetic Intelligence Network. Accessed 2020-08-02.
  6. Bitter, Christian. 2011. "F#-Querying WordNet Online." Blog, Microsoft Docs, October 6. Accessed 2020-08-02.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
4
1
1475
13
0
1402
1407
Words
4
Likes
17K
Hits

Cite As

Devopedia. 2020. "WordNet." Version 17, August 3. Accessed 2024-06-25. https://devopedia.org/wordnet
Contributed by
2 authors


Last updated on
2020-08-03 12:29:32