# Relation Extraction

Consider the phrase "President Clinton was in Washington today". This describes a Located relation between Clinton and Washington. Another example is "Steve Balmer, CEO of Microsoft, said…", which describes a Role relation of Steve Balmer within Microsoft.

The task of extracting semantic relations between entities in text is called Relation Extraction (RE). While Named Entity Recognition (NER) is about identifying entities in text, RE is about finding the relations among the entities. Given unstructured text, NER and RE helps us obtain useful structured representations. Both tasks are part of the discipline of Information Extraction (IE).

Supervised, semi-supervised, and unsupervised approaches exist to do RE. In the 2010s, neural network architectures were applied to RE. Sometimes the term Relation Classification is used, particularly in approaches that treat it as a classification problem.

## Discussion

• What sort of relations are captured in relation extraction?

Here are some relations with examples:

• located-in: CMU is in Pittsburgh
• father-of: Manuel Blum is the father of Avrim Blum
• person-affiliation: Bill Gates works at Microsoft Inc.
• capital-of: Beijing is the capital of China
• part-of: American Airlines, a unit of AMR Corp., immediately matched the move

In general, affiliations involve persons, organizations or artifacts. Geospatial relations involve locations. Part-of relations involve organizations or geo-political entities.

Entity tuple is the common way to represent entities bound in a relation. Given n entities in a relation r, the notation is $$r(e_{1},e_{2},...,e_{n})$$. An example use of this notation is Located-In(CMU, Pittsburgh).

RE mostly deals with binary relations where n=2. For n>2, the term used is higher-order relations. An example of 4-ary biomedical relation is point_mutation(codon, 12, G, T), in the sentence "At codons 12, the occurrence of point mutations from G to T were observed".

• What are some common applications of relation extraction?

Since structured information is easier to use than unstructured text, relation extraction is useful in many NLP applications. RE enriches existing information. Once relations are obtained, they can be stored in databases for future queries. They can be visualized and correlated with other information in the system.

In question answering, one might ask "When was Gandhi born?" Such a factoid question can be answered if our relation database has stored the relation Born-In(Gandhi, 1869).

In biomedical domain, protein binding relations can lead to drug discovery. When relations are extracted from a sentence such as "Gene X with mutation Y leads to malignancy Z", these relations can help us detect cancerous genes. Another example is to know the location of a protein in an organism. This ternary relation is split into two binary relations (Protein-Organism and Protein-Location). Once these are classified, the results are merged into a ternary relation.

• Which are the main techniques for doing relation extraction?

With supervised learning, the model is trained on annotated text. Entities and their relations are annotated. Training involves a binary classifier that detects the presence of a relation, and a classifier to label the relation. For labelling, we could use SVMs, decision trees, Naive Bayes or MaxEnt. Two types of supervision are feature-based or kernel-based.

Since finding large annotated datasets is difficult, a semi-supervised approach is more practical. One approach is to do a phrasal search with wildcards. For example, [ORG] has a hub at [LOC] would return organizations and their hub locations. If we relax the pattern, we'll get more matches but also false positives.

An alternative is to use a set of specific patterns, induced from an initial set of seed patterns and seed tuples. This approach is called bootstrapping. For example, given the seed tuple hub(Ryanair, Charleroi) we can discover many phrasal patterns in unlabelled text. Using these patterns, we can discover more patterns and tuples. However, we have to be careful of semantic drift, in which one wrong tuple/pattern can lead to further errors.

• What sort of features are useful for relation extraction?

Supervised learning uses features. The named entities themselves are useful features. This includes an entity's bag of words, head words and its entity type. It's also useful to look at words surrounding the entities, including words that are in between the two entities. Stems of these words can also be included. The distance between the entities could be useful.

The syntactic structure of the sentence can signal the relations. A syntax tree could be obtained via base-phrase chunking, dependency parsing or full constituent parsing. The paths in these trees can be used to train binary classifiers to detect specific syntactic constructions. The accompanying figure shows possible features in the sentence "[ORG American Airlines], a unit of AMR Corp., immediately matched the move, spokesman [PERS Tim Wagner] said."

When using syntax, expert knowledge of linguistics is needed to know which syntactic constructions correspond to which relations. However, this can be automated via machine learning.

• Could you explain kernel-based methods for supervised relation classification?

Unlike feature-based methods, kernel-based methods don't require explicit feature engineering. They can explore a large feature space in polynomial computation time.

The essence of a kernel is to compute the similarity between two sequences. A kernel could be designed to measure structural similarity of character sequences, word sequences, or parse trees involving the entities. In practice, a kernel is used as a similarity function in classifiers such as SVM or Voted Perceptron.

We note a few kernel designs:

• Subsequence: Uses a sequence of words made of the entities and their surrounding words. Word representation includes POS tag and entity type.
• Syntactic Tree: A constituent parse tree is used. Convolution Parse Tree Kernel is one way to compare similarity of two syntactic trees.
• Dependency Tree: Similarity is computed between two dependency parse trees. This could be enhanced with shallow semantic parsers. A variation is to use dependency graph paths in which the shortest path between entities represents a relation.
• Composite: Combines the above approaches. Subsequence kernels capture lexical information whereas tree kernels capture syntactic information.
• Could you explain distant supervised approach to relation extraction?

Due to extensive work done for Semantic Web, we already have many knowledge bases that contain entity-relation-entity triplets. Examples include DBpedia (3K relations), Freebase (38K relations), YAGO, and Google Knowledge Graph (35K relations). These can be used for relation extraction without requiring annotated text.

Distant supervision is a combination of unsupervised and supervised approaches. It extracts relations without supervision. It also induces thousands of features using a probabilistic classifier.

The process starts by linking named entities to those in the knowledge bases. Using relations in the knowledge base, the patterns are picked up in the text. Patterns are applied to find more relations. Early work used DBpedia and Freebase, and Wikipedia as the text corpus. Later work utilized semi-structured data (HTML tables, Wikipedia list pages, etc.) or even a web search to fill gaps in knowledge graphs.

• Could you compare some semi-supervised or unsupervised approaches of some relation extraction tools?

DIPRE's algorithm (1998) starts with seed relations, applies them to text, induces patterns, and applies the patterns to obtain more tuples. These steps are iterated. When applied to (author, book) relation, patterns take the form (longest-common-suffix of prefix strings, author, middle, book, longest-common-prefix of suffix strings). DIPRE is an application of Yarowsky algorithm (1995) invented for WSD.

Like DIPRE, Snowball (2000) uses seed relations but doesn't look for exact pattern matches. Tuples are represented as vectors, grouped using similarity functions. Each term is also weighted. Weights are adjusted with each iteration. Snowball can handle variations in tokens or punctuation.

KnowItAll (2005) starts with domain-independent extraction patterns. Relation-specific and domain-specific rules are derived from the generic patterns. The rules are applied on a large scale on online text. It uses pointwise mutual information (PMI) measure to retain the most likely patterns and relations.

Unlike earlier algorithms, TextRunner (2007) doesn't require a pre-defined set of rules. It learns relations, classes and entities on its own from a large corpus.

• How are neural networks being used to do relation extraction?

Neural networks were increasingly applied to relation extraction from the early 2010s. Early approaches used Recursive Neural Networks that were applied to syntactic parse trees. The use of Convolutional Neural Networks (CNNs) came next, to extract sentence-level features and the context surrounding words. A combination of these two networks has also been used.

Since CNNs failed to learn long-distance dependencies, Recurrent Neural Networks (RNNs) were found to be more effective in this regard. By 2017, basic RNNs gave way to gated variants called GRU and LSTM. A comparative study showed that CNNs are good at capturing local and position-invariant features whereas RNNs are better at capturing order information long-range context dependency.

The next evolution was towards attention mechanism and pre-trained language models such as BERT. For example, attention mechanism can pick out most relevant words and use CNNs or LSTMs to learn relations. Thus, we don't need explicit dependency trees. In January 2020, it was seen that BERT-based models represent the current state-of-the-art with an F1 score close to 90.

• How do we evaluate algorithms for relation extraction?

Recall, precision and F-measures are typically used to evaluate on a gold-standard of human annotated relations. These are typically used for supervised methods.

For unsupervised methods, it may be sufficient to check if a relation has been captured correctly. There's no need to check if every mention of the relation has been detected. Precision here is simply the correct relations against all relations as judged by human experts. Recall is more difficult to compute. Gazetteers and web resources may be used for this purpose.

• Could you mention some resources for working with relation extraction?

Papers With Code has useful links to recent publications on relation classification. GitHub has a topic page on relation classification. Another useful resource is a curated list of papers, tutorials and datasets.

The current state-of-the-art is captured on the NLP-progress page of relation extraction.

Among the useful datasets for training or evaluation are ACE-2005 (7 major relation types) and SemEval-2010 Task 8 (19 relation types). For distant supervision, Riedel or NYT dataset was formed by aligning Freebase relations with New York Times corpus. There's also Google Distant Supervision (GIDS) dataset and FewRel. TACRED is a large dataset containing 41 relation types from newswire and web text.

## Milestones

1998

At the 7th Message Understanding Conference (MUC), the task of extracting relations between entities is considered. Since this is considered as part of template filling, they call it template relations. Relations are limited to organizations: employee_of, product_of, and location_of.

Jun
2000

Agichtein and Gravano propose Snowball, a semi-supervised approach to generating patterns and extracting relations from a small set of seed relations. At each iteration, it evaluates for quality and keeps only the most reliable patterns and relations.

Feb
2003

Zelenko et al. obtain shallow parse trees from text for use in binary relation classification. They use contiguous and sparse subtree kernels to assess similarity of two parse trees. Subsequently, this kernel-based approach is followed by other researchers: kernels on dependency parse trees of Culotta and Sorensen (2004); subsequence and shortest dependency path kernels of Bunescu and Mooney (2005); convolutional parse kernels of Zhang et al. (2006); and composite kernels of Choi et al. (2009).

2004

Kambhatla takes a feature-based supervised classifier approach to relation extraction. A MaxEnt model is used along with lexical, syntactic and semantic features. Since kernel methods are a generalization of feature-based algorithms, Zhao and Grishman (2005) extend Kambhatla's work by including more syntactic features using kernels, then use SVM to pick out the most suitable features.

Jun
2005

Since binary classifiers have been well studied, McDonald et al. cast the problem of extracting higher-order relations into many binary relations. This also makes the data less sparse and eases computation. Binary relations are represented as a graph, from which cliques are extracted. They find that probabilistic cliques perform better than maximal cliques. The figure corresponds to some binary relations extracted for the sentence "John and Jane are CEOs at Inc. Corp. and Biz. Corp. respectively."

Jan
2007

Banko et al. propose Open Information Extraction along with an implementation that they call TextRunner. In an unsupervised manner, the system is able to extract relations without any human input. Each tuple is assigned a probability and indexed for efficient information retrieval. TextRunner has three components: self-supervised learner, single-pass extractor, and redundancy-based assessor.

Aug
2009

Mintz et al. propose distant supervision to avoid the cost of producing hand-annotated corpus. Using entity pairs that appear in Freebase, they find all sentences in which each pair occurs in unlabelled text, extract textual features and train a relation classifier. The include both lexical and syntactic features. They note that syntactic features are useful when patterns are nearby in the dependency tree but distant in terms of words. In the early 2010s, distant supervision becomes an active area of research.

Aug
2014

Neural networks and word embeddings were first explored by Collobert et al. (2011) for a number of NLP tasks. Zeng et al. apply word embeddings and Convolutional Neural Network (CNN) to relation classification. They treat relation classification as a multi-class classification problem. Lexical features include the entities, their surrounding tokens, and WordNet hypernyms. CNN is used to extract sentence level features, for which each token is represented as word features (WF) and position features (PF).

Jul
2015

Dependency shortest path and subtrees have been shown to be effective for relation classification. Liu et al. propose a recursive neural network to model the dependency subtrees, and a convolutional neural network to capture the most important features on the shortest path.

Oct
2015

Song et al. present PKDE4J, a framework for dictionary-based entity extraction and rule-based relation extraction. Primarily meant for biomedical field, they report F-measures of 85% for entity extraction and 81% for relation extraction. The RE algorithm uses dependency parse trees, which are analyzed to extract heuristic rules. They come up with 17 rules that can be applied to discern relations. Examples of rules include verb in dependency path, nominalization, negation, active/passive voice, entity order, etc.

Aug
2016

Miwa and Bansal propose to jointly model the tasks of NER and RE. A BiLSTM is used on word sequences to obtain the named entities. Another BiLSTM is used on dependency tree structures to obtain the relations. They also find that shortest path dependency tree performs better than subtrees of full trees.

May
2019

Wu and He apply BERT pre-trained language model to relation extraction. They call their model R-BERT. Named entities are identified beforehand and are delimited with special tokens. Since an entity can span multiple tokens, their start/end hidden token representations are averaged. The output is a softmax layer with cross-entropy as the loss function. On SemEval-2010 Task 8, R-BERT achieves state-of-the-art Macro-F1 score of 89.25. Other BERT-based models learn NER and RE jointly, or rely on topological features of an entity pair graph.

Author
No. of Edits
No. of Chats
DevCoins
3
0
1878
2711
Words
2
Likes
2270
Hits

## Cite As

Devopedia. 2020. "Relation Extraction." Version 3, February 6. Accessed 2021-09-09. https://devopedia.org/relation-extraction
Contributed by
1 author

Last updated on
2020-02-06 07:38:52
• Site Map