Natural Language Parsing

Different interpretations of the structure of text. Source: Gatius 2019, slide 5.
Different interpretations of the structure of text. Source: Gatius 2019, slide 5.

Natural languages were designed by humans, for humans to communicate. They're not in a form that can be easily processed or understood by computers. Therefore, natural language parsing is really about finding the underlying structure given an input of text. In some sense, it's the opposite of templating, where you start with a structure and then fill in the data. With parsing, you figure out the structure from the data.

Natural languages follow certain rules of grammar. This helps the parser extract the structure. Formally, we can define parsing as,

the process of determining whether a string of tokens can be generated by a grammar.

There are a number of parsing algorithms. Statistical methods have been popular since the 1980s. By the late 2010s, neural networks were being increasingly used.

Discussion

  • Could you introduce the basics of natural language parsing?
    A typical parsing process flow. Source: Chen 2018, slide 9.
    A typical parsing process flow. Source: Chen 2018, slide 9.

    Text is a sequence of words. This can be simply represented as a "bag of words". This is not very useful. We can obtain a more useful representation by making use of syntactic structure. Such a structure exists because the sentence is expected to follow rules of grammar. By extracting and representing this structure, we transform the original plain input into something more useful for many downstream NLP tasks. Beyond syntax, parsing is also about obtaining the semantic structure.

    In a typical flow, input text goes into a lexical analyzer that produces individual tokens. These tokens are the input to a parser, which produces the syntactic structure at the output. When this structure is graphically represented as a tree, it's called a Parse Tree. A parse tree can be simplified into an intermediate representation called Abstract Syntax Tree (AST).

    Structure can be represented either as a phrase structure tree or in labelled bracketed notation.

  • What are the common difficulties in natural language parsing?
    Ambiguity produces two different parse trees, shown along with bracketing notation. Source: Adapted from Bird et al. 2019, fig. 3.
    Ambiguity produces two different parse trees, shown along with bracketing notation. Source: Adapted from Bird et al. 2019, fig. 3.

    Breaking an input into sentences is the first challenge. Input could be formatted as tables or may contain page breaks. While punctuation is useful for this task, punctuation in abbreviations (e.g. or U.S.) can cause problems.

    Given an input, a parser should be able to pick out the main phrases. This is not a solved problem. A more difficult problem is to obtain the correct semantic relationships and understand the context of discussion. Word embeddings such as word2vec operate at word level. This work needs to be extended to phrases.

    Annotated corpora and neural network models are often about newswire content. Applying them to specific domains such as medicine is problematic.

    Unlike parsing computer languages, parsing natural languages is more challenging because there's often ambiguity in human communication. A well-known example is "I shot an elephant in my pajamas." Was I or was the elephant wearing my pajamas? Humans also use sarcasm, colloquial phrases, idioms, and metaphors. They may also communicate with grammatical or spelling errors.

  • Could you compare constituency parsing with dependency parsing?
    Phrase structure (left) and dependency structure (right). Source: Choi 2009, slide 4.
    Phrase structure (left) and dependency structure (right). Source: Choi 2009, slide 4.

    Constituency parsing and dependency parsing are respectively based on Phrase Structure Grammar (PSG) and Dependency Grammar (DG). Dependency parsing in particular is known to be useful in many NLP applications.

    PSG breaks a sentence into its constituents or phrases. These phrases are in turn broken into more phrases. Thus, the parse tree is recursive. On the other hand, DG is not recursive, implying that phrasal nodes are not produced. Rather, it identifies a network of relations. Two lexical items are asymmetrically related. One of them is the dependent word, the other is the head or governing word. Relations are labelled.

    Consider the sentence, "She bought a car". In constituency parsing, "bought a car" is a verb phrase, which in turn contains noun phrase "a car". Thus, the structure is recursive. With dependency parsing, a directed arc identifies the relation along with a label. Typically, arrows go from head to dependent.

    Languages such as Finnish that allow greater freedom of word order will benefit from dependency representation. This doesn't mean that English can't benefit from dependency representation.

  • What is shallow parsing and how is it useful?
    An example of shallow parsing. Source: Ananiadou et al. 2010, slide 40.
    An example of shallow parsing. Source: Ananiadou et al. 2010, slide 40.

    Constituency parsing is complex. Traditionally, such full parsing was not robust in noisy surroundings (although CoNLL 2007 Shared Task changed that). Some researchers therefore proposed partial parsing where completeness and depth of analysis were sacrificed for efficiency and reliability. Chunking or shallow parsing is a basic task in partial parsing.

    Chunking breaks up a sentence into syntactic constituents called chunks. Thus, each chunk can be one or more adjacent tokens. Unlike full parsing, chunks are not further analyzed. Chunking is thus non-recursive and fast. Chunks alone can be useful for other NLP tasks such as named entity recognition, text mining or terminology discovery. Chunks can also be a useful input to a dependency parser.

    POS tagging tags the words but doesn't bring out the syntactic structure. Chunking can be seen as being somewhere between POS tagging and full parsing. Chunking can include the POS tags. Perceptron, SVM and bidirectional MEMM are some algorithms used for chunking.

  • What are the main approaches to text parsing?

    Parsing is really a search problem. Search space of possible parse trees is defined by a grammar. An example grammar rule is "VP → VP NP". Broadly, there are two parsing strategies:

    • Top Down: Goal-driven. Starts from the root node and expands to the next level of nodes using the grammar. Checks for left-hand side match of grammar rules. Repeat this until we reach the POS tags at the leaves. Trees that don't match the input are removed.
    • Bottom Up: Data-driven. Starts from the input sequence of words and their POS tags. Builds the tree upwards using the grammar. Checks for right-hand side match of grammar rules.

    While bottom-up can waste time searching trees that don't lead to the root sentence, it's grounded on the input and therefore never suggests trees that don't match the input. These pros and cons are reversed with top-down.

    Recursive descent parser is top-down. Shift-reduce parser is bottom-up. Recursive descent parser can't handle left-recursive production. Left-corner parser is a hybrid that solves this problem.

    Borrowing from dynamic programming, chart parsing saves and reuses intermediate results. It also dynamically determines the order in which entries are processed.

  • What are the some algorithms for text parsing?
    Joint span HPSG parsing model with self-attention layers. Source: Zhou and Zhao 2019, fig. 4.
    Joint span HPSG parsing model with self-attention layers. Source: Zhou and Zhao 2019, fig. 4.

    State-of-the-art models in 2019 were based on neural networks. For both constituency parsing and dependency parsing, a combination of self-attention transformer architecture and Head-Driven Phrase Structure Grammar (HPSG) gave the best score. Earlier models used LSTM.

    Before the use of neural networks, there were a number of algorithms: Stanford, Charniak, CJ, Bikel, Berkeley (for constituency); and Covington, Nivre, MSTParser, RelEx (for dependency). Most dependency parsers are much faster than constituency parsers. MaltParser implements Nivre and Covington algorithms. MSTParser implements Eisner and Edmonds algorithms.

    Algorithms that adopt dynamic programming include Cocke-Kasami-Younger (CKY), Earley, and chart parsing. While CKY is bottom-up, Earley is top-down.

  • How do text parsing algorithms disambiguate?
    A lexicalized parse tree and its rules. Source: Collins 2003, fig. 2.
    A lexicalized parse tree and its rules. Source: Collins 2003, fig. 2.

    When there's ambiguity, we end up with multiple parse trees. This is where probabilistic grammar becomes useful, where rules have associated probabilities. The most probably parse tree is selected. The well-known CKY algorithm becomes probabilistic.

    Given a treebank such as the Penn Treebank, it's easy to obtaining these probabilities. If we don't have any such treebank, we can use the Expectation-Maximization (EM) algorithm to iteratively arrive at these probabilities.

    Another approach is to let the probabilistic model account for lexicalized rules. For example, a rule such as "VP → VBD NP PP" is changed to "VP(dumped,VBD) → VBD(dumped,VBD) NP(sacks,NNS) PP(into,NN)". It's difficult to obtain probabilities when we're so specific but there are techniques to do this. Two lexicalized parsers are Collins parser and Charniak parser.

  • Could you mention some text parsers that I can use?
    Sample output from Google Cloud Natural Language API. Source: Nelson 2019.
    Sample output from Google Cloud Natural Language API. Source: Nelson 2019.

    Commonly used Python NLP packages such as NLTK and spaCy have utilities for parsing. FreeLing and Apache OpenNLP support parsing. Berkeley Parser, Stanford Parser and Stanford CoreNLP support parsing. The Stanford Parser includes a shift-reduce constituent parser and a neural network dependency parser.

    In R, udpipe is the package to use for dependency parsing.

    Two open source tools from Google include SLING and SyntaxNet. Both of these are based on neural networks. SLING is trained for semantic frames of interest.

    Although written for computer languages, Aho and Ullman's book (1972) is a classic reference that's applicable for natural language parsing as well.

Milestones

1950

Noam Chomsky introduces Phrase Structure Grammar (PSG) in the 1950s. While mainly used for syntax, it has found applications in semantics and phonology.

1959

Historically, Dependency Grammar (DG) predates phrase structure grammar by many centuries. However, the modern history of dependency grammar starts with Frenchman Lucien Tesnière whose main work is published posthumously. Subsequently, DG advances through the 1960s.

1967

Patterns in the form of regular expressions can be used for parsing. While regular expressions date back to the 1950s, it's in 1967 that Ken Thompson at Bell Labs brings regex from the world of mathematics to computer science for the first time. In the 1970s, regular expressions make their way into some UNIX programs.

1992

Penn Treebank is released. It has 4.5 million words that are POS tagged. One-third of these are also annotated with skeletal syntactic bracketing. A treebank is a collection of text released with corresponding parse trees. Treebanks like this one provide annotated datasets for training statistical models.

1994
Two different simplified HPSG structures (b) and (c). Source: Zhou and Zhao 2019, fig. 3
Two different simplified HPSG structures (b) and (c). Source: Zhou and Zhao 2019, fig. 3

Pollard and Sag develop a highly lexicalized, constraint-based grammar called Head-driven Phrase Structure Grammar (HPSG). In 2004, Miyao et al. acquire this grammar from the Penn Treebank. In 2019, Zhou and Zhao develop a HPSG parser that gives better predictions on both constituent and dependency tree structures.

1995
An example of IOB tagging applied to chunks. Source: Bird et al. 2019b, fig. 2.5.
An example of IOB tagging applied to chunks. Source: Bird et al. 2019b, fig. 2.5.

Based on Brill's earlier work (1992) on POS tagging using transformation-based learning, Ramshaw and Marcus adapt it to the problem of chunking by framing it as a tagging task. Tags are introduced to denote beginning (B), internal (I) and outside part (O) of a chunk. Later this approach is named IOB tagging.

2006

Dependency relations are useful in a number of NLP tasks. Existing dependency parsers are not robust or accurate as phrase structure parsers. The Penn Treebank, and other treebanks trained on large corpora, provide only phrase structure parses. As a solution to this problem, Stanford researchers come up with a system that automatically extracts typed dependency parses from phrase structure parses. In time, this becomes popular as the Stanford Dependencies and leads to Universal Dependencies that can be used for many natural languages.

May
2016

With the release of SyntaxNet from Google, what is claimed to be the world's most accurate English parser, Parsey McParseface, goes open source. This is built using machine learning. A neural network learns to resolve ambiguities. Multiple possible interpretations of structure are preserved via beam search rather than selecting the first best interpretation. On the Penn Treebank, this model achieves 94% accuracy, which compares well against human accuracy of 96-97%.

References

  1. Ananiadou, Sophia, Chikashi Nobata, Yutaka Sasaki, and Yoshimasa Tsuruoka. 2010. "Linguistics Techniques for Text Mining." The University of Machester, via SlideShare, April 26. Accessed 2019-12-03.
  2. Armstrong, Alex. 2017. "Google SLING: An Open Source Natural Language Parser." i-programmer, November 17. Accessed 2019-12-03.
  3. Attardi, Giuseppe, and F. Dell'Orletta. 2008. "Chunking and Dependency Parsing." Semantic Scholar. Accessed 2019-12-04.
  4. Bird, Steven, Ewan Klein, and Edward Loper. 2019. "Analyzing Sentence Structure." Chapter 8 in Natural Language Processing with Python. Accessed 2019-12-03.
  5. Bird, Steven, Ewan Klein, and Edward Loper. 2019b. "Extracting Information from Text." Chapter 7 in Natural Language Processing with Python. Accessed 2019-12-03.
  6. Cer, Daniel, Marie-Catherine de Marneffe, Dan Jurafsky, and Chris Manning. 2010. "Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy." Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), pp. 1628-1632, May. Accessed 2019-12-04.
  7. Chen, Yi-Shin. 2018. "From NLP to Text Mining." National Tsing Hua University, on SlideShare, March 15. Accessed 2019-12-03.
  8. Choi, Jinho D. 2009. "Dependency Parsing." Preliminary Exam, Univ. of Colorado, March 04. Accessed 2019-12-03.
  9. Collins, Michael. 2003. "Head-Driven Statistical Models for Natural Language Parsing." Computational Linguistics, vol. 29, no. 4. pp. 589-637. Accessed 2019-12-03.
  10. Daniel, Ron. 2015. "New open access resource will support text mining and natural language processing." Elsevier Connect, March 12. Accessed 2019-12-03.
  11. Gatius, Marta. 2019. "Basic issues on Parsing." Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. Accessed 2019-12-03.
  12. Ishaq, Muhammad. 2019. "What are some of the challenges we face in NLP today?" Data Driven Investor, on Medium, January 16. Accessed 2019-12-03.
  13. Jurafsky, Daniel and James H. Martin. 2009. "Speech and Language Processing." Second Edition, Prentice-Hall, Inc. Accessed 2019-12-05.
  14. Leech, G., R. Barnett, and P. Kahrel. 1996. "Phrase structure vs dependency." EAGLES, Recommendations for the Syntactic Annotation of Corpora, Lancaster University, March 11. Accessed 2019-12-03.
  15. Levithan, Steven. 2008. "Regex Legends: The People Behind the Magic." Flagrant Badassery, January 13. Accessed 2019-05-04.
  16. Marcus, Mitchell, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. 1994. "The Penn Treebank: Annotating Predicate Argument Structure." HUMAN LANGUAGE TECHNOLOGY: Proceedings of a Workshop held at Plainsboro, New Jersey, pp. 114-119, March 8-11. Accessed 2019-12-05.
  17. Mukherji, Aditya. 2012. "What Is the Difference Between POS Tagging and Shallow Parsing?" Stackoverflow, January 25. Accessed 2019-12-04.
  18. Nelson, Paul. 2019. "Natural Language Processing (NLP) Techniques for Extracting Information." Search & Content Analytics, Accenture Blog. Accessed 2019-12-03.
  19. Petrov, Slav. 2016. "Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source." Google AI Blog, May 12. Accessed 2019-12-03.
  20. Ritchie, Dennis. 2019. "An incomplete history of the QED Text Editor." Accessed 2019-05-08.
  21. Ruder, Sebastian. 2019a. "Constituency parsing." NLP-progress, November 17. Accessed 2019-12-03.
  22. Ruder, Sebastian. 2019b. "Dependency parsing." NLP-progress, November 10. Accessed 2019-12-03.
  23. Sarkar, Dipanjan. 2018. "Understanding Language Syntax and Structure: A Practitioner’s Guide to NLP." KDnuggets, August. Accessed 2019-12-05.
  24. Song, Min. 2019. "Explanations of Dependency Parsing." Section 3.3 of Hands-on Text Mining and Analytics, Yonsei University, on Coursera. Accessed 2019-12-03.
  25. Stanford NLP. 2018. "The Stanford Parser: A statistical parser." Version 3.9.2, October 17. Accessed 2019-12-03.
  26. Tomassetti, Gabriele. 2017a. "A Guide to Parsing: Algorithms and Terminology." September 27. Accessed 2019-12-03.
  27. Tomassetti, Gabriele. 2017b. "Analyze and Understand Text: Guide to Natural Language Processing." November 14. Accessed 2019-12-03.
  28. Van Eynde, Frank. 2017. "Phrase Structure Grammars." Oxford Bibliographies, October 25. Accessed 2019-12-05.
  29. Wikipedia. 2019. "Dependency grammar." Wikipedia, August 29. Accessed 2019-12-05.
  30. Zhou, Junru, and Hai Zhao. 2019. "Head-Driven Phrase Structure Grammar Parsing on Penn Treebank." Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2396-2408, July. Accessed 2019-12-05.
  31. bnosoc. 2018. "Natural Language Processing for non-English languages with udpipe." Blog, bnosoc. Accessed 2019-12-03.
  32. de Marneffe, Marie-Catherine, Bill MacCartney, and Christopher D. Manning. 2006. "Generating Typed Dependency Parses from Phrase Structure Parses." Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), pp. 449-454, May. Accessed 2019-12-05.
  33. de Marneffe, Marie-Catherine, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. "Universal Stanford dependencies: A cross-linguistic typology." Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp. 4585-4592, May. Accessed 2019-12-05.

Further Reading

  1. Jurafsky, Daniel and James H. Martin. 2009. "Speech and Language Processing." Chapters 13 & 14, Second Edition, Prentice-Hall, Inc. Accessed 2019-12-05.
  2. Sarkar, Dipanjan. 2018. "Understanding Language Syntax and Structure: A Practitioner’s Guide to NLP." KDnuggets, August. Accessed 2019-12-05.
  3. Chen, Yi-Shin. 2018. "From NLP to Text Mining." National Tsing Hua University, on SlideShare, March 15. Accessed 2019-12-03.
  4. Tomassetti, Gabriele. 2017a. "A Guide to Parsing: Algorithms and Terminology." September 27. Accessed 2019-12-03.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
4
0
2325
1
0
6
1922
Words
10
Likes
17K
Hits

Cite As

Devopedia. 2022. "Natural Language Parsing." Version 5, February 15. Accessed 2023-11-12. https://devopedia.org/natural-language-parsing
Contributed by
2 authors


Last updated on
2022-02-15 11:54:42