Relation Extraction
- Summary
-
Discussion
- What sort of relations are captured in relation extraction?
- What are some common applications of relation extraction?
- Which are the main techniques for doing relation extraction?
- What sort of features are useful for relation extraction?
- Could you explain kernel-based methods for supervised relation classification?
- Could you explain distant supervised approach to relation extraction?
- Could you compare some semi-supervised or unsupervised approaches of some relation extraction tools?
- How are neural networks being used to do relation extraction?
- How do we evaluate algorithms for relation extraction?
- Could you mention some resources for working with relation extraction?
- Milestones
- References
- Further Reading
- Article Stats
- Cite As
Consider the phrase "President Clinton was in Washington today". This describes a Located relation between Clinton and Washington. Another example is "Steve Balmer, CEO of Microsoft, said…", which describes a Role relation of Steve Balmer within Microsoft.
The task of extracting semantic relations between entities in text is called Relation Extraction (RE). While Named Entity Recognition (NER) is about identifying entities in text, RE is about finding the relations among the entities. Given unstructured text, NER and RE helps us obtain useful structured representations. Both tasks are part of the discipline of Information Extraction (IE).
Supervised, semi-supervised, and unsupervised approaches exist to do RE. In the 2010s, neural network architectures were applied to RE. Sometimes the term Relation Classification is used, particularly in approaches that treat it as a classification problem.
Discussion
-
What sort of relations are captured in relation extraction? Here are some relations with examples:
- located-in: CMU is in Pittsburgh
- father-of: Manuel Blum is the father of Avrim Blum
- person-affiliation: Bill Gates works at Microsoft Inc.
- capital-of: Beijing is the capital of China
- part-of: American Airlines, a unit of AMR Corp., immediately matched the move
In general, affiliations involve persons, organizations or artifacts. Geospatial relations involve locations. Part-of relations involve organizations or geo-political entities.
Entity tuple is the common way to represent entities bound in a relation. Given n entities in a relation r, the notation is \(r(e_{1},e_{2},...,e_{n})\). An example use of this notation is Located-In(CMU, Pittsburgh).
RE mostly deals with binary relations where n=2. For n>2, the term used is higher-order relations. An example of 4-ary biomedical relation is point_mutation(codon, 12, G, T), in the sentence "At codons 12, the occurrence of point mutations from G to T were observed".
-
What are some common applications of relation extraction? Since structured information is easier to use than unstructured text, relation extraction is useful in many NLP applications. RE enriches existing information. Once relations are obtained, they can be stored in databases for future queries. They can be visualized and correlated with other information in the system.
In question answering, one might ask "When was Gandhi born?" Such a factoid question can be answered if our relation database has stored the relation Born-In(Gandhi, 1869).
In biomedical domain, protein binding relations can lead to drug discovery. When relations are extracted from a sentence such as "Gene X with mutation Y leads to malignancy Z", these relations can help us detect cancerous genes. Another example is to know the location of a protein in an organism. This ternary relation is split into two binary relations (Protein-Organism and Protein-Location). Once these are classified, the results are merged into a ternary relation.
-
Which are the main techniques for doing relation extraction? With supervised learning, the model is trained on annotated text. Entities and their relations are annotated. Training involves a binary classifier that detects the presence of a relation, and a classifier to label the relation. For labelling, we could use SVMs, decision trees, Naive Bayes or MaxEnt. Two types of supervision are feature-based or kernel-based.
Since finding large annotated datasets is difficult, a semi-supervised approach is more practical. One approach is to do a phrasal search with wildcards. For example,
[ORG] has a hub at [LOC]
would return organizations and their hub locations. If we relax the pattern, we'll get more matches but also false positives.An alternative is to use a set of specific patterns, induced from an initial set of seed patterns and seed tuples. This approach is called bootstrapping. For example, given the seed tuple hub(Ryanair, Charleroi) we can discover many phrasal patterns in unlabelled text. Using these patterns, we can discover more patterns and tuples. However, we have to be careful of semantic drift, in which one wrong tuple/pattern can lead to further errors.
-
What sort of features are useful for relation extraction? Supervised learning uses features. The named entities themselves are useful features. This includes an entity's bag of words, head words and its entity type. It's also useful to look at words surrounding the entities, including words that are in between the two entities. Stems of these words can also be included. The distance between the entities could be useful.
The syntactic structure of the sentence can signal the relations. A syntax tree could be obtained via base-phrase chunking, dependency parsing or full constituent parsing. The paths in these trees can be used to train binary classifiers to detect specific syntactic constructions. The accompanying figure shows possible features in the sentence "[ORG American Airlines], a unit of AMR Corp., immediately matched the move, spokesman [PERS Tim Wagner] said."
When using syntax, expert knowledge of linguistics is needed to know which syntactic constructions correspond to which relations. However, this can be automated via machine learning.
-
Could you explain kernel-based methods for supervised relation classification? Unlike feature-based methods, kernel-based methods don't require explicit feature engineering. They can explore a large feature space in polynomial computation time.
The essence of a kernel is to compute the similarity between two sequences. A kernel could be designed to measure structural similarity of character sequences, word sequences, or parse trees involving the entities. In practice, a kernel is used as a similarity function in classifiers such as SVM or Voted Perceptron.
- Subsequence: Uses a sequence of words made of the entities and their surrounding words. Word representation includes POS tag and entity type.
- Syntactic Tree: A constituent parse tree is used. Convolution Parse Tree Kernel is one way to compare similarity of two syntactic trees.
- Dependency Tree: Similarity is computed between two dependency parse trees. This could be enhanced with shallow semantic parsers. A variation is to use dependency graph paths in which the shortest path between entities represents a relation.
- Composite: Combines the above approaches. Subsequence kernels capture lexical information whereas tree kernels capture syntactic information.
-
Could you explain distant supervised approach to relation extraction? Due to extensive work done for Semantic Web, we already have many knowledge bases that contain
entity-relation-entity
triplets. Examples include DBpedia (3K relations), Freebase (38K relations), YAGO, and Google Knowledge Graph (35K relations). These can be used for relation extraction without requiring annotated text.Distant supervision is a combination of unsupervised and supervised approaches. It extracts relations without supervision. It also induces thousands of features using a probabilistic classifier.
The process starts by linking named entities to those in the knowledge bases. Using relations in the knowledge base, the patterns are picked up in the text. Patterns are applied to find more relations. Early work used DBpedia and Freebase, and Wikipedia as the text corpus. Later work utilized semi-structured data (HTML tables, Wikipedia list pages, etc.) or even a web search to fill gaps in knowledge graphs.
-
Could you compare some semi-supervised or unsupervised approaches of some relation extraction tools? DIPRE's algorithm (1998) starts with seed relations, applies them to text, induces patterns, and applies the patterns to obtain more tuples. These steps are iterated. When applied to (author, book) relation, patterns take the form
(longest-common-suffix of prefix strings, author, middle, book, longest-common-prefix of suffix strings)
. DIPRE is an application of Yarowsky algorithm (1995) invented for WSD.Like DIPRE, Snowball (2000) uses seed relations but doesn't look for exact pattern matches. Tuples are represented as vectors, grouped using similarity functions. Each term is also weighted. Weights are adjusted with each iteration. Snowball can handle variations in tokens or punctuation.
KnowItAll (2005) starts with domain-independent extraction patterns. Relation-specific and domain-specific rules are derived from the generic patterns. The rules are applied on a large scale on online text. It uses pointwise mutual information (PMI) measure to retain the most likely patterns and relations.
Unlike earlier algorithms, TextRunner (2007) doesn't require a pre-defined set of rules. It learns relations, classes and entities on its own from a large corpus.
-
How are neural networks being used to do relation extraction? Neural networks were increasingly applied to relation extraction from the early 2010s. Early approaches used Recursive Neural Networks that were applied to syntactic parse trees. The use of Convolutional Neural Networks (CNNs) came next, to extract sentence-level features and the context surrounding words. A combination of these two networks has also been used.
Since CNNs failed to learn long-distance dependencies, Recurrent Neural Networks (RNNs) were found to be more effective in this regard. By 2017, basic RNNs gave way to gated variants called GRU and LSTM. A comparative study showed that CNNs are good at capturing local and position-invariant features whereas RNNs are better at capturing order information long-range context dependency.
The next evolution was towards attention mechanism and pre-trained language models such as BERT. For example, attention mechanism can pick out most relevant words and use CNNs or LSTMs to learn relations. Thus, we don't need explicit dependency trees. In January 2020, it was seen that BERT-based models represent the current state-of-the-art with an F1 score close to 90.
-
How do we evaluate algorithms for relation extraction? Recall, precision and F-measures are typically used to evaluate on a gold-standard of human annotated relations. These are typically used for supervised methods.
For unsupervised methods, it may be sufficient to check if a relation has been captured correctly. There's no need to check if every mention of the relation has been detected. Precision here is simply the correct relations against all relations as judged by human experts. Recall is more difficult to compute. Gazetteers and web resources may be used for this purpose.
-
Could you mention some resources for working with relation extraction? Papers With Code has useful links to recent publications on relation classification. GitHub has a topic page on relation classification. Another useful resource is a curated list of papers, tutorials and datasets.
The current state-of-the-art is captured on the NLP-progress page of relation extraction.
Among the useful datasets for training or evaluation are ACE-2005 (7 major relation types) and SemEval-2010 Task 8 (19 relation types). For distant supervision, Riedel or NYT dataset was formed by aligning Freebase relations with New York Times corpus. There's also Google Distant Supervision (GIDS) dataset and FewRel. TACRED is a large dataset containing 41 relation types from newswire and web text.
Milestones
2000
2003
Zelenko et al. obtain shallow parse trees from text for use in binary relation classification. They use contiguous and sparse subtree kernels to assess similarity of two parse trees. Subsequently, this kernel-based approach is followed by other researchers: kernels on dependency parse trees of Culotta and Sorensen (2004); subsequence and shortest dependency path kernels of Bunescu and Mooney (2005); convolutional parse kernels of Zhang et al. (2006); and composite kernels of Choi et al. (2009).
Kambhatla takes a feature-based supervised classifier approach to relation extraction. A MaxEnt model is used along with lexical, syntactic and semantic features. Since kernel methods are a generalization of feature-based algorithms, Zhao and Grishman (2005) extend Kambhatla's work by including more syntactic features using kernels, then use SVM to pick out the most suitable features.
2005
Since binary classifiers have been well studied, McDonald et al. cast the problem of extracting higher-order relations into many binary relations. This also makes the data less sparse and eases computation. Binary relations are represented as a graph, from which cliques are extracted. They find that probabilistic cliques perform better than maximal cliques. The figure corresponds to some binary relations extracted for the sentence "John and Jane are CEOs at Inc. Corp. and Biz. Corp. respectively."
2007
Banko et al. propose Open Information Extraction along with an implementation that they call TextRunner. In an unsupervised manner, the system is able to extract relations without any human input. Each tuple is assigned a probability and indexed for efficient information retrieval. TextRunner has three components: self-supervised learner, single-pass extractor, and redundancy-based assessor.
2009
Mintz et al. propose distant supervision to avoid the cost of producing hand-annotated corpus. Using entity pairs that appear in Freebase, they find all sentences in which each pair occurs in unlabelled text, extract textual features and train a relation classifier. The include both lexical and syntactic features. They note that syntactic features are useful when patterns are nearby in the dependency tree but distant in terms of words. In the early 2010s, distant supervision becomes an active area of research.
2014
Neural networks and word embeddings were first explored by Collobert et al. (2011) for a number of NLP tasks. Zeng et al. apply word embeddings and Convolutional Neural Network (CNN) to relation classification. They treat relation classification as a multi-class classification problem. Lexical features include the entities, their surrounding tokens, and WordNet hypernyms. CNN is used to extract sentence level features, for which each token is represented as word features (WF) and position features (PF).
2015
2015
Song et al. present PKDE4J, a framework for dictionary-based entity extraction and rule-based relation extraction. Primarily meant for biomedical field, they report F-measures of 85% for entity extraction and 81% for relation extraction. The RE algorithm uses dependency parse trees, which are analyzed to extract heuristic rules. They come up with 17 rules that can be applied to discern relations. Examples of rules include verb in dependency path, nominalization, negation, active/passive voice, entity order, etc.
2016
Miwa and Bansal propose to jointly model the tasks of NER and RE. A BiLSTM is used on word sequences to obtain the named entities. Another BiLSTM is used on dependency tree structures to obtain the relations. They also find that shortest path dependency tree performs better than subtrees of full trees.
2019
Wu and He apply BERT pre-trained language model to relation extraction. They call their model R-BERT. Named entities are identified beforehand and are delimited with special tokens. Since an entity can span multiple tokens, their start/end hidden token representations are averaged. The output is a softmax layer with cross-entropy as the loss function. On SemEval-2010 Task 8, R-BERT achieves state-of-the-art Macro-F1 score of 89.25. Other BERT-based models learn NER and RE jointly, or rely on topological features of an entity pair graph.
References
- Agichtein, Eugene, and Luis Gravano. 2000. "Snowball: extracting relations from large plain-text collections." Proceedings of the fifth ACM conference on Digital libraries, pp. 85-94, June. https://doi.org/10.1145/336597.336644. Accessed 2020-02-04.
- Bach, Nguyen, and Sameer Badaskar. 2007a. "A Survey on Relation Extraction." Carnegie Mellon University. Accessed 2020-02-02.
- Bach, Nguyen, and Sameer Badaskar. 2007b. "A Survey on Relation Extraction: Slides." Carnegie Mellon University. Accessed 2020-02-02.
- Banko, Michele, Michael J Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. "Open information extraction from the web." Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670-2676, January. Accessed 2020-02-04.
- Chinchor, Nancy A. 1998. "Overview of MUC-7." Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29-May 1. Accessed 2020-02-02.
- Eberts, Markus, and Adrian Ulges. 2019. "Span-based Joint Entity and Relation Extraction with Transformer Pre-training." arXiv, v3, November 19. Accessed 2020-02-02.
- Herman, Andreas. 2019. "Different ways of doing Relation Extraction from text." Medium, May 3. Accessed 2020-02-02.
- Jung, Hanmin, Sung-Pil Choi, Seungwoo Lee, and Sa-Kwang Song. 2012. "Survey on Kernel-Based Relation Extraction." IntechOpen, November 21. Accessed 2020-02-02.
- Jurafsky, Daniel and James H. Martin. 2009. "Information Extraction." Chapter 22 in Speech and Language Processing, Second Edition, Prentice-Hall, Inc. Accessed 2020-02-02.
- Kambhatla, Nanda. 2004. "Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction." Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 178-181, July. Accessed 2020-02-02.
- Lee, Joohong. 2019. "Awesome Relation Extraction." roomylee/awesome-relation-extraction, on GitHub, October 9. Accessed 2020-02-02.
- Lee, Joohong, Sangwoo Seo, and Yong Suk Choi. 2019. "Semantic Relation Classification via Bidirectional LSTM Networks with Entity-aware Attention using Latent Entity Typing." arXiv, v1, January 23. Accessed 2020-02-02.
- Liu, Yudong, Zhongmin Shi, and Anoop Sarkar. 2007. "Exploiting Rich Syntactic Information for Relationship Extraction from Biomedical Articles." Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pp. 97-100, April. Accessed 2020-02-04.
- Liu, Yang, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, and Houfeng Wang. 2015. "A Dependency-Based Neural Network for Relation Classification." Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 285-290, July. Accessed 2020-02-02.
- McDonald, Ryan, Fernando Pereira, Seth Kulick, Scott Winters, Yang Jin, and Pete White. 2005. "Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE." Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 491-498, June. Accessed 2020-02-04.
- Mintz, Mike, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. "Distant supervision for relation extraction without labeled data." Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003-1011, August. Accessed 2020-02-02.
- Miwa, Makoto, and Mohit Bansal. 2016. "End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105-1116, August. Accessed 2020-02-04.
- Mosic, Ranko. 2018. "Building Commoditized Sophisticated Financial NLP Applications With Google Cloud NLP." Medium, June 12. Accessed 2020-02-04.
- Paulheim, Heiko. 2017. "Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods." Semantic Web, IOS Press, vol. 8, no. 3, pp. 489-508. Accessed 2020-02-02.
- Pawar, Sachin, Girish K. Palshikar, and Pushpak Bhattacharyya. 2017. "Relation Extraction: A Survey." arXiv, v1, December 14. Accessed 2020-02-02.
- Ruder, Sebastian. 2020. "Relationship Extraction." NLP-progress, February 2. Accessed 2020-02-02.
- Shen, Yatian, and Xuanjing Huang. 2016. "Attention-Based Convolutional Neural Network for Semantic Relation Extraction." Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2526-2536, December. Accessed 2020-02-04.
- Shi, Yong, Yang Xiao, and Lingfeng Niu. 2019. "A Brief Survey of Relation Extraction based on Distant Supervision." International Conference on Computational Science, pp. 292-303. Accessed 2020-02-02.
- Song, Min, Won Chul Kim, Dahee Lee, Go Eun Heo, and Keun Young Kang. 2015. "PKDE4J: Entity and relation extraction for public knowledge discovery." Journal of Biomedical Informatics, Elsevier, vol. 57, pp. 320-332, October. Accessed 2020-02-02.
- Wang, Mengqiu. 2008. "A Re-examination of Dependency Path Kernels for Relation Extraction." Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II. Accessed 2020-02-04.
- Wu, Shanchan and Yifan He. 2019. "Enriching Pre-trained Language Model with Entity Information for Relation Classification." arXiv, v1, May 20. Accessed 2020-02-04.
- Yin, Wenpeng, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. "Comparative Study of CNN and RNN for Natural Language Processing." arXiv, v1, February 7. Accessed 2020-02-02.
- Zelenko, Dmitry, Chinatsu Aone, and Anthony Richardella. 2003. "Kernel Methods for Relation Extraction." Journal of Machine Learning Research, vol. 3, pp. 1083-1106, February. Accessed 2020-02-04.
- Zeng, Daojian, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. "Relation Classification via Convolutional Deep Neural Network." Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 2335-2344, August. Accessed 2020-02-02.
- Zhang, Dongxu, and Dong Wang. 2015. "Relation Classification via Recurrent Neural Network." arXiv, v2, December 25. Accessed 2020-02-02.
- Zhao, Shubin, and Ralph Grishman. 2005. "Extracting Relations with Integrated Information Using Kernel Methods." Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 419-426, June. Accessed 2020-02-04.
- Zhao, Yi, Huaiyu Wan, Jianwei Gao, and Youfang Lin. 2019. "Improving Relation Classification by Entity Pair Graph." Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR, vol. 101, pp. 1156-1171. Accessed 2020-02-02.
Further Reading
- Herman, Andreas. 2019. "Different ways of doing Relation Extraction from text." Medium, May 3. Accessed 2020-02-02.
- Pawar, Sachin, Girish K. Palshikar, and Pushpak Bhattacharyya. 2017. "Relation Extraction: A Survey." arXiv, v1, December 14. Accessed 2020-02-02.
- Bach, Nguyen, and Sameer Badaskar. 2007a. "A Survey on Relation Extraction." Carnegie Mellon University. Accessed 2020-02-02.
- Shi, Yong, Yang Xiao, and Lingfeng Niu. 2019. "A Brief Survey of Relation Extraction based on Distant Supervision." International Conference on Computational Science, pp. 292-303. Accessed 2020-02-02.
- Jung, Hanmin, Sung-Pil Choi, Seungwoo Lee, and Sa-Kwang Song. 2012. "Survey on Kernel-Based Relation Extraction." IntechOpen, November 21. Accessed 2020-02-02.
- Bunescu, Razvan C., and Raymond J. Mooney. 2005. "Subsequence Kernels for Relation Extraction." Advances in Neural Information Processing Systems 18 (NIPS 2005). Accessed 2020-02-04.
Article Stats
Cite As
See Also
- Named Entity Recognition
- Semantic Role Labelling
- Natural Language Parsing
- Coreference Resolution
- Text Summarization
- Question Answering