WordNet is a database of words in the English language. Unlike a dictionary that's organized alphabetically, WordNet is organized by concept and meaning. In fact, traditional dictionaries were created for humans but what's needed is a lexical resource more suited for computers. This is where WordNet becomes useful.
WordNet is a network of words linked by lexical and semantic relations. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms, called synsets, each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the WordNet browser.
WordNet is freely and publicly available for download.
What's the distinction between WordNet and a thesaurus?
A thesaurus provides similar words (synonyms) and opposites (antonyms). WordNet does much more than this. Via synsets, WordNet brings together specific word senses. As a result, words that are found in close proximity to one another in the network are semantically disambiguated.
A synset is also linked to other synsets by semantic relations. Such relations are missing in a thesaurus. These relations are based on concepts and therefore give us valuable information about words. For example, the verbs (communicate, talk, whisper) are all about talking but the manner goes from general to specific. A similar example with nouns would be (furniture, bed, bunkbed). An example of a part-whole relation is (leg, chair). These sorts of relations are captured in WordNet.
Could you explain WordNet's synsets with an example?
Consider the word 'bike'. It has multiple meanings. It could be a motorcycle (noun), a bicycle (noun) or bicycle (verb). WordNet represents these as three synsets with unique names:
Each synset has an array of lemma names that share the same concept. Thus,
motorcycle.n.01has the words 'motorcycle' and 'bike'. The synset also has a definition, also called gloss. It now becomes clear that the word 'bike' must be present in all the three synsets, each representing a different concept or meaning.
It's also possible to move from one synset to another by certain relations. For example,
motor_vehicle.n.01is a more general concept of
trail_bike.n.01are more specific concepts of
motorcycle.n.01. Thus, synsets are linked by relations making WordNet a network of conceptually related words.
What is WordNet used for?
WordNet is typically used by linguistics, psychologists and those working in the fields of AI and NLP. Among its many applications are word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, and machine translation.
WordNet can be used as a thesaurus except that words are organized by concept and semantic/lexical relations. In NLP, WordNet has become a useful tool for word sense disambiguation. When a word has multiple senses, WordNet can help in identifying the correct sense. WordNet's symbolic approach complements statistical approaches.
Measuring similarity between words is another application. Different algorithms exist to measure word similarity. Such a similarity measure can be used in spelling checking or question answering. However, WordNet is limited to noun-noun or verb-verb similarity. We can't compare nouns and verbs, or use other parts of speech.
Where neural networks are used for NLP work, word embeddings (low-dimensional vectors) are used. However, word embeddings don't discriminate different senses. WordNet has been applied to create sense embeddings.
What are major lexical relations captured in WordNet?
- Synonymy: Synonyms are words that have similar meanings. Often context determines which synonym is best suited.
- Polysemy: Polysemous words have more than one sense. The word bank can mean river bank, where money is stored, a building, or institution. Polysemy is associated with the terms homonymy and metonymy.
- Hyponymy/Hypernymy: Is-a relation. Robin is a hyponym of bird since robin is a type of bird. Likewise, bird is a hypernym of robin. Thus, hypernyms are synsets that are more general whereas hyponyms are more specific.
- Meronymy/Holonymy: Part-whole relation. Beak is meronym of bird since beak is part of a bird's anatomy. Likewise, bird is holonym of beak. WordNet identifies three types of relations: components (leg, table), constituents (oxygen, water), and members (parent, family).
- Antonymy: Lexical opposites such as (large, small).
- Troponymy: Applicable for verbs. For example, whisper is a troponym of talk since whisper elaborates on the manner of talking.
What are some limitations of WordNet?
Semantic relations are more suited to concrete concepts, such as tree is a hypernym of conifer. It's less suited to abstract concepts such as fear or happiness where it's hard to identify hyponym/hypernym relations. Some relations may also be language specific and therefore can make different wordnets less interoperable.
WordNet doesn't include information about the etymology. Thus, word origins and how they've evolved over time are not captured. Offensive words are also included and it's left to applications to decide what's offensive since meanings change over time. Pronunciation is missing. There's limited information about usage. WordNet covers most of everyday English but doesn't include domain-specific terminology.
Collins and Quillian propose a hierarchical semantic memory model for storing information in computer systems. They hypothesize that human memory is in fact organized in this manner. They test their hypothesis by measuring retrieval times. For example, if a person is asked if a canary can fly, the actual retrieval might involve inference from a memory that contains "canary is a bird" and "birds can fly". This important work goes on to influence the creation of WordNet almost two decades later.
Some psychologists and linguists at Princeton University start developing a lexical database. While a dictionary helps us search for words alphabetically, a lexical database allows us to search based on concepts. This marks the beginning of Princeton WordNet. We can say that it's a dictionary based on psycholinguistic principles.
EuroWordNet is started as an EU project covering languages Dutch, Spanish and Italian. It's inspired by and is designed to link to the Princeton WordNet. In 1997, more languages are added: German, French, Czech and Estonian. The project is completed towards the end of 1999. One novel feature is the Inter-Lingual-Index (ILI) that defines equivalence relations between synsets in different languages. In later years, this work is extended by other projects: EUROTERM, BALKANET, and MEANING. By 2006, it's noted that databases exist for 35 languages globally.
- Abd-Elwasaa, Ahmed. 2016. "WORDNET: A Database of Lexical Relations." SlideShare, January 2. Accessed 2020-08-02.
- Camacho-Collados, Jose, and Mohammad Taher Pilehvar. 2018. "From Word to Sense Embeddings: A Survey on Vector Representations of Meaning." arXiv, v3, October 26. Accessed 2020-08-03.
- Collins, Allan M. and M. Ross Quillian. 1969. "Retrieval Time from Semantic Memory." J. Verbal Learning and Verbal Behavior, vol. 8, no. 2, pp. 240-247, April. Accessed 2020-08-03.
- Educative. 2020. "How to use WordNet in Python." Educative, Inc. Accessed 2020-08-03.
- Fellbaum, C. 2006. "WordNet(s)." In: Keith Brown (ed), Encyclopedia of Language & Linguistics, Second Edition, vol. 13, pp. 665-670. Oxford: Elsevier. Accessed 2020-08-02.
- Fellbaum, C. 2012. "WordNet." First Interdisciplinary Summer School on Ontological Analysis, Trento, Italy, July 16-20. Accessed 2020-08-03.
- Fellbaum, C. and G. A. Miller. 2006. "Whither WordNet?" Presentation, LREC. Accessed 2020-08-03.
- Global WordNet Association. 2020. "English WordNet: Releases." GitHub, April 17. Accessed 2020-08-03.
- Jurafsky, Daniel and James H. Martin. 2019. "WordNet: Word Relations,Senses, and Disambiguation." Chapter C in: Speech and Language Processing, Third Edition draft, October 16. Accessed 2020-08-03.
- Khoi, Nguyen. 2012. "WordNet Introduction." SlideShare, October 31. Accessed 2020-08-02.
- Miller, George A. 1995. "WordNet: A LexicalDatabase for English." Comm. of the ACM, vol. 38, no. 11, pp. 39-41, November. Accessed 2020-08-02.
- Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. 1993. "Introduction to WordNet." August. Accessed 2020-08-02.
- Princeton University. 2020a. "WordNet: A Lexical Database for English." Princeton University. Accessed 2020-08-02.
- Princeton University. 2020b. "Current Version." WordNet Download, Princeton University. Accessed 2020-08-02.
- Raj, Govind. 2013. "WordNet: A Lexical Knowledgebase." SlideShare, November 1. Accessed 2020-08-02.
- Vossen, P. 2002. "WordNet, EuroWordNet and Global WordNet." Revue française de linguistique appliquée, vol. vii(1), pp. 27-38. doi:10.3917/rfla.071.0027. Accessed 2020-08-03.
- Wikipedia. 2020. "WordNet." Wikipedia, July 27. Accessed 2020-08-02.
- WordNet Docs. 2020. "wnstats(7WN)." WordNet Docs, Princeton University. Accessed 2020-08-03.
- Fellbaum, Christiane (ed). 1998. "WordNet: An Electronic Lexical Database." Cambridge, MA: MIT Press. Accessed 2020-08-02.
- Loria, Steven. 2013. "Tutorial: What is WordNet? A Conceptual Introduction Using Python." September 30. Updated 2014-10-26. Accessed 2020-08-02.
- sentdex. 2015. "WordNet - Natural Language Processing With Python and NLTK (part 10)." sentdex, on YouTube, May 11. Accessed 2020-08-02.
- W3C. 2007. "WordNet RDF/OWL Files." Revision 1.3, W3C, January 10. Accessed 2020-08-02.
- Synthetic Intelligence Network. 2020. "WordNet Tutorial." Tutorial, Synthetic Intelligence Network. Accessed 2020-08-02.
- Bitter, Christian. 2011. "F#-Querying WordNet Online." Blog, Microsoft Docs, October 6. Accessed 2020-08-02.