• For the verb 'loaded', semantic roles of other words and phrases in the sentence are identified. Source: Lascarides 2019, slide 10.
    For the verb 'loaded', semantic roles of other words and phrases in the sentence are identified. Source: Lascarides 2019, slide 10.
  • FrameNet workflows, roles, data structures and software. Source: Baker et al. 1998, fig. 3.
    FrameNet workflows, roles, data structures and software. Source: Baker et al. 1998, fig. 3.
  • An example sentence with both syntactic and semantic dependency annotations. Source: Johansson and Nugues 2008, fig. 1.
    An example sentence with both syntactic and semantic dependency annotations. Source: Johansson and Nugues 2008, fig. 1.
  • Confirmation that Proto-Agent and Proto-Patient properties predict subject and object respectively. Source: Reisinger et al. 2015, fig. 4-5.
    Confirmation that Proto-Agent and Proto-Patient properties predict subject and object respectively. Source: Reisinger et al. 2015, fig. 4-5.
  • Neural network architecture of the SLING parser. Source: Ringgaard et al. 2017, fig. 1.
    Neural network architecture of the SLING parser. Source: Ringgaard et al. 2017, fig. 1.
  • SpanGCN encoder: red/black lines represent parent-child/child-parent relations respectively. Source: Marcheggiani and Titov 2019, fig. 2.
    SpanGCN encoder: red/black lines represent parent-child/child-parent relations respectively. Source: Marcheggiani and Titov 2019, fig. 2.
  • A TreeBanked sentence also PropBanked with semantic role labels. Source: Palmer 2013, slide 6.
    A TreeBanked sentence also PropBanked with semantic role labels. Source: Palmer 2013, slide 6.
  • SRL is helpful for question answering. Source: Yih and Toutanova 2006, slide 2.
    SRL is helpful for question answering. Source: Yih and Toutanova 2006, slide 2.
  • Thematic roles with examples. Source: Jurafsky 2015, slide 10.
    Thematic roles with examples. Source: Jurafsky 2015, slide 10.
  • Comparing PropBank and FrameNet representations. Source: Jurafsky 2015, slide 37.
    Comparing PropBank and FrameNet representations. Source: Jurafsky 2015, slide 37.
  • Architecture and details of LISA for SRL. Source: Strubell et al. 2018, fig. 1-2.
    Architecture and details of LISA for SRL. Source: Strubell et al. 2018, fig. 1-2.

Semantic Role Labelling

Avatar of user arvindpdmn
arvindpdmn
1543 DevCoins
1 author has contributed to this article
Last updated by arvindpdmn
on 2020-01-10 08:42:21
Created by arvindpdmn
on 2019-12-29 04:55:32
Improve this article. Show messages

Summary

For the verb 'loaded', semantic roles of other words and phrases in the sentence are identified. Source: Lascarides 2019, slide 10.
For the verb 'loaded', semantic roles of other words and phrases in the sentence are identified. Source: Lascarides 2019, slide 10.

In linguistics, predicate refers to the main verb in the sentence. Predicate takes arguments. The role of Semantic Role Labelling (SRL) is to determine how these arguments are semantically related to the predicate.

Consider the sentence "Mary loaded the truck with hay at the depot on Friday". 'Loaded' is the predicate. Mary, truck and hay have respective semantic roles of loader, bearer and cargo. We can identify additional roles of location (depot) and time (Friday). The job of SRL is to identify these roles so that downstream NLP tasks can "understand" the sentence.

SRL is also known by other names such as thematic role labelling, case role assignment, or shallow semantic parsing.

Milestones

350
BC

Indian grammarian Pāṇini authors Aṣṭādhyāyī, a treatise on Sanskrit grammar. It records rules of linguistics, syntax and semantics. His work is discovered only in the 19th century by European scholars. His work identifies semantic roles under the name of kāraka.

1965

In what may be the beginning of modern thematic roles, Gruber gives the example of motional verbs (go, fly, swim, enter, cross) and states that the entity conceived of being moved is the theme. The theme is syntactically and semantically significant to the sentence and its situation. A related development of semantic roles is due to Fillmore (1968).

1991

Dowty notes that all through the 1980s new thematic roles were proposed. There's no consensus even on the common thematic roles. A large number of roles results in role fragmentation and inhibits useful generalizations. As an alternative, he proposes Proto-Agent and Proto-Patient based on verb entailments. An argument may be either or both of these in varying degrees. He then considers both fine-grained and coarse-grained verb arguments, and 'role hierarchies'. Essentially, Dowty focuses on the mapping problem, which is about how syntax maps to semantics.

Sep
1993

Beth Levin published English Verb Classes and Alternations. This work classifies over 3,000 verbs by meaning and behaviour. She makes a hypothesis that a verb's meaning influences its syntactic behaviour. She then shows how identifying verbs with similar syntactic structures can lead us to semantically coherent verb classes. For example, "John cut the bread" and "Bread cuts easily" are valid. But 'cut' can't be used in these forms: "The bread cut" or "John cut at the bread".

1997
FrameNet workflows, roles, data structures and software. Source: Baker et al. 1998, fig. 3.

FrameNet is launched as a three-year NSF-funded project. Semantic information is manually annotated on large corpora along with descriptions of semantic frames. Conceptual structures are called frames. Role names are called frame elements. For example, in the Transportation frame, Driver, Vehicle, Rider, and Cargo are possible frame elements.

2000

Kipper et al. at the University of Pennsylvania create VerbNet. This is a verb lexicon that includes syntactic and semantic information. In 2004 and 2005, other researchers extend Levin classification with more classes. In 2008, Kipper et al. use Levin-style classification on PropBank with 90% coverage, thus providing useful resource for researchers.

2000

Just as Penn Treebank has enabled syntactic parsing, the Propositional Bank or PropBank project is proposed to build a semantic lexical resource to aid research into linguistic semantics. The idea is to add a layer of predicate-argument structure to the Penn Treebank II corpus. By 2005, this corpus is complete. It uses VerbNet classes. In time, PropBank becomes the preferred resource for SRL since FrameNet is not representative of the language.

2002

Making use of FrameNet, Gildea and Jurafsky apply statistical techniques to identify semantic roles filled by constituents. Their work also studies different features and their combinations. They also explore how syntactic parsing can integrate with SRL. Other techniques explored are automatic clustering, WordNet hierarchy, and bootstrapping from unlabelled data. In the coming years, this work influences greater application of statistics and machine learning to SRL.

2004

Swier and Stevenson note that SRL approaches are typically supervised and rely on manually annotated FrameNet or PropBank. They propose an unsupervised "bootstrapping" method. They start with unambiguous role assignments based on a verb lexicon. In further iterations, they use the probability model derived from current role assignments. This may well be the first instance of unsupervised SRL.

Jun
2008

Punyakanok et al. apply full syntactic parsing to the task of SRL. They show that this impacts most during the pruning stage. Based on CoNLL-2005 Shared Task, they also show that when outputs of two different constituent parsers (Collins and Charniak) are combined, the resulting performance is much higher. They call this joint inference.

Oct
2008
An example sentence with both syntactic and semantic dependency annotations. Source: Johansson and Nugues 2008, fig. 1.

Johansson and Nugues note that state-of-the-art use of parse trees are based on constituent parsing and not much has been achieved with dependency parsing. This is due to low parsing accuracy. They use dependency-annotated Penn TreeBank from 2008 CoNLL Shared Task on joint syntactic-semantic analysis. Using only dependency parsing, they achieve state-of-the-art results.

2009

Researchers propose SemLink as a tool to map PropBank representations to VerbNet or FrameNet. PropBank provides best training data. VerbNet excels in linking semantics and syntax. FrameNet provides richest semantics. SemLink allows us to use the best of all three lexical resources. For example, VerbNet can be used to merge PropBank and FrameNet to expand training resources. By 2014, SemLink integrates OntoNotes sense groupings, WordNet and WSJ Tokens as well. Shi and Mihalcea (2005) presented an earlier work on combining FrameNet, VerbNet and WordNet.

2015
Confirmation that Proto-Agent and Proto-Patient properties predict subject and object respectively. Source: Reisinger et al. 2015, fig. 4-5.

Inspired by Dowty's work on proto roles in 1991, Reisinger et al. produce a large-scale corpus-based annotation. They use PropBank as the data source and use Mechanical Turk crowdsourcing platform. They confirm that fine-grained role properties predict the mapping of semantic roles to argument position. In 2016, this work leads to Universal Decompositional Semantics, which adds semantics to the syntax of Universal Dependencies.

Nov
2017
Neural network architecture of the SLING parser. Source: Ringgaard et al. 2017, fig. 1.

Google's open sources SLING that represents the meaning of a sentence as a semantic frame graph. Unlike a traditional SRL pipeline that involves dependency parsing, SLING avoids intermediate representations and directly captures semantic annotations. It uses an encoder-decoder architecture. Simple lexical features (raw word, suffix, punctuation, etc.) are used to represent input words. Decoder computes sequence of transitions and updates the frame graph. The system is based on the frame semantics of Fillmore (1982).

Sep
2019
SpanGCN encoder: red/black lines represent parent-child/child-parent relations respectively. Source: Marcheggiani and Titov 2019, fig. 2.

While dependency parsing has become popular lately, it's really constituents that act as predicate arguments. Marcheggiani and Titov use Graph Convolutional Network (GCN) in which graph nodes represent constituents and graph edges represent parent-child relations. BiLSTM states represent start and end tokens of constituents. Their earlier work from 2017 also used GCN but to model dependency relations.

Discussion

  • Why do we need semantic role labelling when there's already parsing?
    A TreeBanked sentence also PropBanked with semantic role labels. Source: Palmer 2013, slide 6.
    A TreeBanked sentence also PropBanked with semantic role labels. Source: Palmer 2013, slide 6.

    Often an idea can be expressed in multiple ways. Consider these sentences that all mean the same thing: "Yesterday, Kristina hit Scott with a baseball"; "Scott was hit by Kristina yesterday with a baseball"; "With a baseball, Kristina hit Scott yesterday"; "Kristina hit Scott with a baseball yesterday".

    Either constituent or dependency parsing will analyze these sentence syntactically. But syntactic relations don't necessarily help in determining semantic roles. One way to understand SRL is via an analogy. In image captioning, we extract main objects in the picture, how they are related and the background scene. This is precisely what SRL does but from unstructured input text. Such an understanding goes beyond syntax.

    However, parsing is not completely useless for SRL. In a traditional SRL pipeline, a parse tree helps in identifying the predicate arguments.

    But SRL performance can be impacted if the parse tree is wrong. This has motivated SRL approaches that completely ignore syntax. However, many research papers through the 2010s have shown how syntax can be effectively used to achieve state-of-the-art SRL.

  • What are some applications of SRL?
    SRL is helpful for question answering. Source: Yih and Toutanova 2006, slide 2.
    SRL is helpful for question answering. Source: Yih and Toutanova 2006, slide 2.

    SRL is useful in any NLP application that requires semantic understanding: machine translation, information extraction, text summarization, question answering, and more. For example, predicates and heads of roles help in document summarization. For information extraction, SRL can be used to construct extraction rules.

    SRL can be seen as answering "who did what to whom". Obtaining semantic information thus benefits many downstream NLP tasks such as question answering, dialogue systems, machine reading, machine translation, text-to-scene generation, and social network analysis.

    Historically, early applications of SRL include Wilks (1973) for machine translation; Hendrix et al. (1973) for question answering; Nash-Webber (1975) for spoken language understanding; and Bobrow et al. (1977) for dialogue systems.

  • Which are the essential roles used in SRL?
    Thematic roles with examples. Source: Jurafsky 2015, slide 10.
    Thematic roles with examples. Source: Jurafsky 2015, slide 10.

    One of the oldest models is called thematic roles that dates back to Pāṇini from about 4th century BC. Roles are assigned to subjects and objects in a sentence. Roles are based on the type of event. For example, if the verb is 'breaking', roles would be breaker and broken thing for subject and object respectively. Some examples of thematic roles are agent, experiencer, result, content, instrument, and source. There's no well-defined universal set of thematic roles.

    A modern alternative from 1991 is proto-roles that defines only two roles: Proto-Agent and Proto-Patient. Using heuristic features, algorithms can say if an argument is more agent-like (intentionality, volitionality, causality, etc.) or patient-like (undergoing change, affected by, etc.).

  • How are VerbNet, PropBank and FrameNet relevant to SRL?
    Comparing PropBank and FrameNet representations. Source: Jurafsky 2015, slide 37.
    Comparing PropBank and FrameNet representations. Source: Jurafsky 2015, slide 37.

    Verbs can realize semantic roles of their arguments in multiple ways. This is called verb alternations or diathesis alternations. Consider "Doris gave the book to Cary" and "Doris gave Cary the book". The verb 'gave' realizes THEME (the book) and GOAL (Cary) in two different ways. VerbNet is a resource that groups verbs into semantic classes and their alternations.

    PropBank contains sentences annotated with proto-roles and verb-specific semantic roles. Arguments to verbs are simply named Arg0, Arg1, etc. Typically, Arg0 is the Proto-Agent and Arg1 is the Proto-Patient. Being also verb-specific, PropBank records roles for each sense of the verb. For example, for the word sense 'agree.01', Arg0 is the Agreer, Arg1 is Proposition, and Arg2 is other entity agreeing.

    An idea can be expressed with similar words such as increased (verb), rose (verb), or rise (noun). PropBank may not handle this very well. FrameNet is another lexical resources defined in terms of frames rather than verbs. For every frame, core roles and non-core roles are defined. Frames can inherit from or causally link to other frames.

  • What's the typical SRL processing pipeline?

    SRL involves predicate identification, predicate disambiguation, argument identification, and argument classification.

    Argument identification is aided by full parse trees. However, in some domains such as biomedical, full parse trees may not be available. In such cases, chunking is used instead.

    When a full parse is available, pruning is an important step. Using heuristic rules, we can discard constituents that are unlikely arguments. In fact, full parsing contributes most in the pruning step. Pruning is a recursive process.

    If each argument is classified independently, we ignore interactions among arguments. A better approach is to assign multiple possible labels to each argument. Then we can use global context to select the final labels. This step is called reranking.

  • Which are the main approaches to SRL?

    Early SRL systems were rule based, with rules derived from grammar. Since the mid-1990s, statistical approaches became popular due to FrameNet and PropBank that provided training data. Classifiers could be trained from feature sets. A set of features might include the predicate, constituent phrase type, head word and its POS, predicate-constituent path, voice (active/passive), constituent position (before/after predicate), and so on.

    SRL has traditionally been a supervised task but adequate annotated resources for training are scarce. Research from early 2010s focused on inducing semantic roles and frames. There's also been research on transferring an SRL model to low-resource languages.

    One novel approach trains a supervised model using question-answer pairs. Given a sentence, even non-experts can accurately generate a number of diverse pairs. We therefore don't need to compile a pre-defined inventory of semantic roles or frames.

  • Which are the neural network approaches to SRL?
    Architecture and details of LISA for SRL. Source: Strubell et al. 2018, fig. 1-2.
    Architecture and details of LISA for SRL. Source: Strubell et al. 2018, fig. 1-2.

    Neural network approaches to SRL are the state-of-the-art since the mid-2010s. We note a few of them.

    Roth and Lapata (2016) used dependency path between predicate and its argument. Words and relations along the path are represented and input to an LSTM. Another input layer encodes binary features. A hidden layer combines the two inputs using RLUs. Finally, there's a classification layer.

    He et al. (2017) used deep BiLSTM with highway connections and recurrent dropout. With word-predicate pairs as input, output via softmax are the predicted tags that use BIO tag notation. GloVe input embeddings were used. Another research group also used BiLSTM with highway connections but used CNN+BiLSTM to learn character embeddings for the input.

    Since 2018, self-attention has been used for SRL. Strubell et al. (2018) applied it to train a model to jointly predict POS tags and predicates, do parsing, attend to syntactic parse parents, and assign semantic roles. One of the self-attention layers attends to syntactic relations. Shi and Lin used BERT for SRL without using syntactic features and still got state-of-the-art results.

References

  1. Baker, Collin F., Charles J. Fillmore, and John B. Lowe. 1998. "The Berkeley FrameNet Project." 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1, ACL, pp. 86-90, August. Accessed 2019-01-10.
  2. Dowty, David. 1991. "Thematic proto-roles and argument selection." Language, vol. 6, no. 3, pp. 547-619, Linguistic Society of America. Accessed 2019-12-29.
  3. FitzGerald, Nicholas, Julian Michael, Luheng He, and Luke Zettlemoyer. 2018. "Large-Scale QA-SRL Parsing." arXiv, v1, May 14. Accessed 2019-12-28.
  4. Gildea, Daniel, and Daniel Jurafsky. 2002. "Automatic Labeling of Semantic Roles." Computational Linguistics, vol. 28, no. 3, pp. 245-288, September. Accessed 2019-12-29.
  5. Gruber, Jeffrey S. 1965. "Studies in Lexical Relations." Thesis, MIT, September. Accessed 2019-01-10.
  6. He, Luheng, Mike Lewis, and Luke Zettlemoyer. 2015. "Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 643-653, September. Accessed 2019-12-28.
  7. He, Luheng, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. "Deep Semantic Role Labeling: What Works and What’s Next." Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 473-483, July. Accessed 2019-12-28.
  8. He, Shexia, Zuchao Li, Hai Zhao, and Hongxiao Bai. 2018a. "Syntax for Semantic Role Labeling, To Be, Or Not To Be." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, pp. 2061-2071, July. Accessed 2019-12-29.
  9. He, Luheng, Kenton Lee, Omer Levy, and Luke Zettlemoyer. 2018b. "Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 364-369, July. Accessed 2019-12-28.
  10. Johansson, Richard, and Pierre Nugues. 2008. "Dependency-based Semantic Role Labeling of PropBank." Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 69-78, October. Accessed 2019-12-28.
  11. Jurafsky, Daniel. 2015. "Semantic Role Labeling." Slides, Stanford University, August 8. Accessed 2019-12-28.
  12. Jurafsky, Daniel and James H. Martin. 2009. "Speech and Language Processing." Second Edition, Prentice-Hall, Inc. Accessed 2019-12-25.
  13. Kingsbury, Paul and Martha Palmer. 2002. "From Treebank to PropBank." In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Spain, pp. 1989-1993. Accessed 2019-01-10.
  14. Kipper, Karin, Anna Korhonen, Neville Ryant, and Martha Palmer. 2008. "A large-scale classification of English verbs." Language Resources and Evaluation, vol. 42, no. 1, pp. 21-40, March. Accessed 2019-12-29.
  15. Kozhevnikov, Mikhail, and Ivan Titov. 2013. "Cross-lingual Transfer of Semantic Role Labeling Models." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, pp. 1190-2000, August. Accessed 2019-12-28.
  16. Lascarides, Alex. 2019. "Semantic Role Labelling and Argument Structure." Lecture 16, Foundations of Natural Language Processing, School of Informatics, Univ. of Edinburgh, August 28. Accessed 2019-12-28.
  17. Levin, Beth. 1993. "English Verb Classes and Alternations." University of Chicago Press. Accessed 2019-12-29.
  18. Lim, Soojong, Changki Lee, and Dongyul Ra. 2013. "Dependency-based semantic role labeling using sequence labeling with a structural SVM." Pattern Recognition Letters, vol. 34, no. 6, pp. 696-702, April 15. Accessed 2019-12-28.
  19. Marcheggiani, Diego, and Ivan Titov. 2017. "Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 1506-1515, September. Accessed 2019-12-28.
  20. Marcheggiani, Diego, and Ivan Titov. 2019. "Graph Convolutions over Constituent Trees for Syntax-Aware Semantic Role Labeling." arXiv, v1, September 21. Accessed 2019-12-28.
  21. Màrquez, Lluís, Xavier Carreras, Kenneth C. Litkowski, and Suzanne Stevenson. 2008. "Semantic Role Labeling: An Introduction to the Special Issue." Computational Linguistics, vol. 34, no. 2, pp. 145-159, June. Accessed 2019-12-28.
  22. Palmer, Martha. 2013. "Linguistic Background, Resources, Annotation." Part 1, Semantic Role Labeling Tutorial, NAACL, June 9. Accessed 2019-12-28.
  23. Palmer, Martha, Dan Gildea, and Paul Kingsbury. 2005. "The Proposition Bank: A Corpus Annotated with Semantic Roles." Computational Linguistics Journal, vol. 31, no. 1, March. Accessed 2019-12-29.
  24. Palmer, Martha, Claire Bonial, and Diana McCarthy. 2014. "SemLink+: FrameNet, VerbNet and Event Ontologies." Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014), ACL, pp. 13-17, June. Accessed 2019-12-29.
  25. Punyakanok, Vasin, Dan Roth, and Wen-tau Yih. 2008. "The Importance of Syntactic Parsing and Inference in Semantic Role Labeling." Computational Linguistics, vol. 34, no. 2, pp. 257-287, June. Accessed 2019-12-28.
  26. Reisinger, Drew, Rachel Rudinger, Francis Ferraro, Craig Harman, Kyle Rawlins, and Benjamin Van Durme. 2015. "Semantic Proto-Roles." Transactions of the Association for Computational Linguistics, vol. 3, pp. 475-488. Accessed 2019-01-10.
  27. Ringgaard, Michael and Rahul Gupta. 2017. "SLING: A Natural Language Frame Semantic Parser." Google AI Blog, November 15. Accessed 2019-12-28.
  28. Ringgaard, Michael, Rahul Gupta, and Fernando C. N. Pereira. 2017. "SLING: A framework for frame semantic parsing." arXiv, v1, October 19. Accessed 2019-12-29.
  29. Roth, Michael, and Mirella Lapata. 2015. "Context-aware Frame-Semantic Role Labeling." Transactions of the Association for Computational Linguistics, vol. 3, pp. 449-460. Accessed 2019-12-28.
  30. Roth, Michael, and Mirella Lapata. 2016. "Neural Semantic Role Labeling with Dependency Path Embeddings." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, pp. 1192-1202, August. Accessed 2019-12-28.
  31. Ruder, Sebastian. 2019. "Semantic role labeling." NLP-progress, December 4. Accessed 2019-12-28.
  32. Shi, Peng, and Jimmy Lin. 2019. "Simple BERT Models for Relation Extraction and Semantic Role Labeling." arXiv, v1, April 10. Accessed 2019-12-28.
  33. Shi, Lei and Rada Mihalcea. 2005. "Putting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing." In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg, pp. 100-111. Accessed 2019-12-29.
  34. Strubell, Emma, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. 2018. "Linguistically-Informed Self-Attention for Semantic Role Labeling." arXiv, v3, November 12. Accessed 2019-12-28.
  35. Swier, Robert S., and Suzanne Stevenson. 2004. "Unsupervised Semantic Role Labelling." Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 95-102, July. Accessed 2019-12-29.
  36. Titov, Ivan. 2019. "Inducing Semantic Representations From Text." Accessed 2019-12-29.
  37. Wikipedia. 2019a. "Argument (linguistics)." Wikipedia, November 23. Accessed 2019-12-29.
  38. Wikipedia. 2019b. "Pāṇini." Wikipedia, December 18. Accessed 2019-12-28.
  39. Yih, Scott Wen-tau and Kristina Toutanova. 2006. "Automatic Semantic Role Labeling." HLT-NAACL-06 Tutorial, June 4. Accessed 2019-12-28.
  40. Škrjanec, Iza. 2018. "Predicate-argument structure and thematic roles." Universität des Saarlandes. Accessed 2019-12-29.

Milestones

350
BC

Indian grammarian Pāṇini authors Aṣṭādhyāyī, a treatise on Sanskrit grammar. It records rules of linguistics, syntax and semantics. His work is discovered only in the 19th century by European scholars. His work identifies semantic roles under the name of kāraka.

1965

In what may be the beginning of modern thematic roles, Gruber gives the example of motional verbs (go, fly, swim, enter, cross) and states that the entity conceived of being moved is the theme. The theme is syntactically and semantically significant to the sentence and its situation. A related development of semantic roles is due to Fillmore (1968).

1991

Dowty notes that all through the 1980s new thematic roles were proposed. There's no consensus even on the common thematic roles. A large number of roles results in role fragmentation and inhibits useful generalizations. As an alternative, he proposes Proto-Agent and Proto-Patient based on verb entailments. An argument may be either or both of these in varying degrees. He then considers both fine-grained and coarse-grained verb arguments, and 'role hierarchies'. Essentially, Dowty focuses on the mapping problem, which is about how syntax maps to semantics.

Sep
1993

Beth Levin published English Verb Classes and Alternations. This work classifies over 3,000 verbs by meaning and behaviour. She makes a hypothesis that a verb's meaning influences its syntactic behaviour. She then shows how identifying verbs with similar syntactic structures can lead us to semantically coherent verb classes. For example, "John cut the bread" and "Bread cuts easily" are valid. But 'cut' can't be used in these forms: "The bread cut" or "John cut at the bread".

1997
FrameNet workflows, roles, data structures and software. Source: Baker et al. 1998, fig. 3.

FrameNet is launched as a three-year NSF-funded project. Semantic information is manually annotated on large corpora along with descriptions of semantic frames. Conceptual structures are called frames. Role names are called frame elements. For example, in the Transportation frame, Driver, Vehicle, Rider, and Cargo are possible frame elements.

2000

Kipper et al. at the University of Pennsylvania create VerbNet. This is a verb lexicon that includes syntactic and semantic information. In 2004 and 2005, other researchers extend Levin classification with more classes. In 2008, Kipper et al. use Levin-style classification on PropBank with 90% coverage, thus providing useful resource for researchers.

2000

Just as Penn Treebank has enabled syntactic parsing, the Propositional Bank or PropBank project is proposed to build a semantic lexical resource to aid research into linguistic semantics. The idea is to add a layer of predicate-argument structure to the Penn Treebank II corpus. By 2005, this corpus is complete. It uses VerbNet classes. In time, PropBank becomes the preferred resource for SRL since FrameNet is not representative of the language.

2002

Making use of FrameNet, Gildea and Jurafsky apply statistical techniques to identify semantic roles filled by constituents. Their work also studies different features and their combinations. They also explore how syntactic parsing can integrate with SRL. Other techniques explored are automatic clustering, WordNet hierarchy, and bootstrapping from unlabelled data. In the coming years, this work influences greater application of statistics and machine learning to SRL.

2004

Swier and Stevenson note that SRL approaches are typically supervised and rely on manually annotated FrameNet or PropBank. They propose an unsupervised "bootstrapping" method. They start with unambiguous role assignments based on a verb lexicon. In further iterations, they use the probability model derived from current role assignments. This may well be the first instance of unsupervised SRL.

Jun
2008

Punyakanok et al. apply full syntactic parsing to the task of SRL. They show that this impacts most during the pruning stage. Based on CoNLL-2005 Shared Task, they also show that when outputs of two different constituent parsers (Collins and Charniak) are combined, the resulting performance is much higher. They call this joint inference.

Oct
2008
An example sentence with both syntactic and semantic dependency annotations. Source: Johansson and Nugues 2008, fig. 1.

Johansson and Nugues note that state-of-the-art use of parse trees are based on constituent parsing and not much has been achieved with dependency parsing. This is due to low parsing accuracy. They use dependency-annotated Penn TreeBank from 2008 CoNLL Shared Task on joint syntactic-semantic analysis. Using only dependency parsing, they achieve state-of-the-art results.

2009

Researchers propose SemLink as a tool to map PropBank representations to VerbNet or FrameNet. PropBank provides best training data. VerbNet excels in linking semantics and syntax. FrameNet provides richest semantics. SemLink allows us to use the best of all three lexical resources. For example, VerbNet can be used to merge PropBank and FrameNet to expand training resources. By 2014, SemLink integrates OntoNotes sense groupings, WordNet and WSJ Tokens as well. Shi and Mihalcea (2005) presented an earlier work on combining FrameNet, VerbNet and WordNet.

2015
Confirmation that Proto-Agent and Proto-Patient properties predict subject and object respectively. Source: Reisinger et al. 2015, fig. 4-5.

Inspired by Dowty's work on proto roles in 1991, Reisinger et al. produce a large-scale corpus-based annotation. They use PropBank as the data source and use Mechanical Turk crowdsourcing platform. They confirm that fine-grained role properties predict the mapping of semantic roles to argument position. In 2016, this work leads to Universal Decompositional Semantics, which adds semantics to the syntax of Universal Dependencies.

Nov
2017
Neural network architecture of the SLING parser. Source: Ringgaard et al. 2017, fig. 1.

Google's open sources SLING that represents the meaning of a sentence as a semantic frame graph. Unlike a traditional SRL pipeline that involves dependency parsing, SLING avoids intermediate representations and directly captures semantic annotations. It uses an encoder-decoder architecture. Simple lexical features (raw word, suffix, punctuation, etc.) are used to represent input words. Decoder computes sequence of transitions and updates the frame graph. The system is based on the frame semantics of Fillmore (1982).

Sep
2019
SpanGCN encoder: red/black lines represent parent-child/child-parent relations respectively. Source: Marcheggiani and Titov 2019, fig. 2.

While dependency parsing has become popular lately, it's really constituents that act as predicate arguments. Marcheggiani and Titov use Graph Convolutional Network (GCN) in which graph nodes represent constituents and graph edges represent parent-child relations. BiLSTM states represent start and end tokens of constituents. Their earlier work from 2017 also used GCN but to model dependency relations.

Tags

See Also

Further Reading

  1. Màrquez, Lluís, Xavier Carreras, Kenneth C. Litkowski, and Suzanne Stevenson. 2008. "Semantic Role Labeling: An Introduction to the Special Issue." Computational Linguistics, vol. 34, no. 2, pp. 145-159, June. Accessed 2019-12-28.
  2. Punyakanok, Vasin, Dan Roth, and Wen-tau Yih. 2008. "The Importance of Syntactic Parsing and Inference in Semantic Role Labeling." Computational Linguistics, vol. 34, no. 2, pp. 257-287, June. Accessed 2019-12-28.
  3. Gildea, Daniel, and Daniel Jurafsky. 2002. "Automatic Labeling of Semantic Roles." Computational Linguistics, vol. 28, no. 3, pp. 245-288, September. Accessed 2019-12-29.
  4. He, Luheng. 2017. "Deep Semantic Role Labeling: What Works and What's Next." Allen Institute for AI, on YouTube, May 21. Accessed 2019-12-28.
  5. Guan, Chaoyu, Yuhao Cheng, and Hai Zhao. 2019. "Semantic Role Labeling with Associated Memory Network." arXiv, v1, August 5. Accessed 2019-12-28.
  6. Christensen, Janara, Mausam, Stephen Soderland, and Oren Etzioni. 2010. "Semantic Role Labeling for Open Information Extraction." Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, ACL, pp. 52-60, June. Accessed 2019-12-28.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
0
1543
2292
Words
0
Chats
3
Edits
0
Likes
511
Hits

Cite As

Devopedia. 2020. "Semantic Role Labelling." Version 3, January 10. Accessed 2020-04-02. https://devopedia.org/semantic-role-labelling