• An example sentence and possible questions. Source: Du et al. 2017, fig. 1.
    An example sentence and possible questions. Source: Du et al. 2017, fig. 1.
  • Chomsky's Subject Condition with examples. Source: Bach and Horn 1976, sec. 4.
    Chomsky's Subject Condition with examples. Source: Bach and Horn 1976, sec. 4.
  • QG differs for narrative and informational texts. Source: Chen et al. 2009.
    QG differs for narrative and informational texts. Source: Chen et al. 2009.
  • Steps in a rule-based QG system. Source: Heilman and Smith 2010, fig. 1.
    Steps in a rule-based QG system. Source: Heilman and Smith 2010, fig. 1.
  • Architecture of the topic-to-question generation system. Source: Chali and Hasan 2015, fig. 1.
    Architecture of the topic-to-question generation system. Source: Chali and Hasan 2015, fig. 1.
  • Feature-rich encoder and attention-based decoder in a seq2seq QG model. Source: Zhou et al. 2017, fig. 1.
    Feature-rich encoder and attention-based decoder in a seq2seq QG model. Source: Zhou et al. 2017, fig. 1.
  • Joint training of QA and QG. Source: Tang et al. 2018, fig. 1.
    Joint training of QA and QG. Source: Tang et al. 2018, fig. 1.
  • LearningQ contains better training examples. Source: Chen et al. 2018, table 1.
    LearningQ contains better training examples. Source: Chen et al. 2018, table 1.
  • Architecture of BERT-HLSQG. Source: Chan and Fan 2019, fig. 4.
    Architecture of BERT-HLSQG. Source: Chan and Fan 2019, fig. 4.
  • Good distractors are needed for MCQs. Source: Susanti et al. 2017, fig. 1.
    Good distractors are needed for MCQs. Source: Susanti et al. 2017, fig. 1.
  • An example pipeline to generate fill-in-the-blank questions. Source: Aditya S 2018.
    An example pipeline to generate fill-in-the-blank questions. Source: Aditya S 2018.
  • An improved seq2seq model for QG. Source: Zhao et al. 2018, fig. 1.
    An improved seq2seq model for QG. Source: Zhao et al. 2018, fig. 1.
  • QG datasets compared. Source: Pan et al. 2019, table 1.
    QG datasets compared. Source: Pan et al. 2019, table 1.
  • Various criteria for human evaluation of QG systems. Source: Amidei et al. 2018, table 7.
    Various criteria for human evaluation of QG systems. Source: Amidei et al. 2018, table 7.

Question Generation

Avatar of user arvindpdmn
arvindpdmn
1707 DevCoins
1 author has contributed to this article
Last updated by arvindpdmn
on 2020-05-23 12:14:43
Created by arvindpdmn
on 2020-04-07 05:38:59

Summary

An example sentence and possible questions. Source: Du et al. 2017, fig. 1.
An example sentence and possible questions. Source: Du et al. 2017, fig. 1.

Given some content, the goal of Question Generation (QG) is to automatically generate a set of questions that can be answered by that content. This content can be in the form of sentences, paragraphs, documents, databases or even images.

A common application of question generation is to automatically prepare questions for quizzes, assessments or even FAQs that present the content in a more readable form.

Traditionally, rules and templates were used to generate questions. Since the mid-2010s, there's been greater interest in using statistical methods, particularly neural networks. Question generation is also closely linked to other NLP tasks such as question answering. In fact, early neural network models were adapted from machine translation, text summarization and image captioning.

Milestones

1973
Chomsky's Subject Condition with examples. Source: Bach and Horn 1976, sec. 4.

In an article titled Conditions on transformations Chomsky proposes a linguistic approach in which questions are really transformations of canonical declarative sentences.

May
2003

Mitkov and Ha generate multiple choice questions using "transformational rules, a shallow parser, automatic term extraction, word sense disambiguation, a corpus and WordNet". Sentences with domain-specific terms are considered for question generation. Distractors are selected to be semantically close to the correct answer. For example, if the correct answer is 'syntax', 'semantics' and 'pragmatics' are better choices than 'football' or 'chemistry'. An example rule is to generate "Which HVO" (H for hypernym) for an SV-type sentence.

Jul
2003

Echihabi and Marcu generate factoid question-answer pairs by mining structured or semi-structured databases such as World Fact Book, Biography.com or WordNet. They apply extraction patterns from information retrieval to manually define template pairs. However, their end goal is not question generation but question answering. In fact, research on question generation in this decade is mostly motivated by the task of question answering.

2009
QG differs for narrative and informational texts. Source: Chen et al. 2009.

Chen et al. note that self-questioning improves reading comprehension among learners. In this context, they automatically generate questions from natural text. However, narrative text is different from informational text. The former involves characters, behaviour and mental states. The latter involves descriptions and explanations. For each question type, they use rules and question templates. Their approach uses Semantic Role Labelling (SRL). Lindberg et al. (2013) also use SRL and templates.

Jun
2010
Steps in a rule-based QG system. Source: Heilman and Smith 2010, fig. 1.

Heilman and Smith propose a rule-based QG system for generating factoid questions. Rules are based on linguistic knowledge. Rules are generic and not dependent on sentence types. Their two-step process first simplifies input sentence and then transforms it into a question. Answer phrases (noun phrases or prepositional phrases) are identified and then replaced with question phrases. They over-generate questions and then rank them using a logistic regression model.

Jul
2010

Following the annual tasks run by CoNLL, the first Shared Task Evaluation Challenge on Question Generation (QG-STEC) is organized. Although inspired by NLG, researchers note that QG is currently seen as a discourse processing task rather than an NLG task. This shared task aims to provide questions in an application-independent manner based on core ideas in the input text. They focus on two categories: QG from sentences and QG from paragraphs. For the latter, questions are of type "who, where, when, which, what, why, how many/long, yes/no." Input sources are Wikipedia, OpenLearn, and Yahoo!Answers.

2011

While previous research work mostly considered sentence-level context, Agarwal et al. consider paragraph-level context. Discourse connectives connect two clauses or sentences to establish temporal, causal, elaboration, contrast, or result relations. They make use of these connectives to identify question types and generate questions of the type "why, when, give an example, and yes/no". They use QGSTEC-2010 and Wikipedia datasets.

2015
Architecture of the topic-to-question generation system. Source: Chali and Hasan 2015, fig. 1.

Chali and Hasan propose topic-to-question generation in which input texts are about a specific topic. Generated questions are therefore topic-focused. Named entities, semantic role labels and predicate argument structures are used to generate questions. Using Latent Dirichlet Allocation (LDA), they identify sub-topics to rank questions by relevance. They also evaluate the syntactic correctness of generated questions. They argue that this approach enables a QA system to answer a complex question from simpler questions.

2017
Feature-rich encoder and attention-based decoder in a seq2seq QG model. Source: Zhou et al. 2017, fig. 1.

Sequence-to-sequence neural network models with attention, first applied to machine translation in 2014, is applied to the task of question generation. In one approach using BiLSTM, attention-based sentence encoder and paragraph encoder are used. Input could include rich features such as POS and NER tags. Another research includes pointer-softmax to include in the question relevant words from the input. The model considers both input document and answers. It also applies reinforcement learning by feeding the questions to a QA system.

Jun
2018
Joint training of QA and QG. Source: Tang et al. 2018, fig. 1.

Tang et al. explore how QG and QA models can be jointly trained. In particular, policy gradient method can be used to update the QG model based on QA-specific signals. They apply Generative Adversarial Network (GAN) to generate question-answer pairs. A collaboration detector (CD) determines positive versus negative training instances. In their QG model, they adapt seq2seq model to Table2Seq. Table headers, cells, and caption are encoded into continuous vectors using Bidirectional GRU. Decoder uses attention-based GRU with copying mechanism.

Jun
2018
LearningQ contains better training examples. Source: Chen et al. 2018, table 1.

Chen et al. note that Stanford Question Answering Dataset (SQuAD) and RACE datasets were collected for reading comprehension and not suited for assessing higher cognitive skills such as applying or analyzing. They address this by creating LearningQ, a dataset of 230K document-question pairs from online learning platforms. Dataset uses TED-Ed and Khan Academy as sources. They show that QG models that currently do well on other datasets are challenged by LearningQ.

Nov
2019
Architecture of BERT-HLSQG. Source: Chan and Fan 2019, fig. 4.

Chan and Fan use BERT for QG. Their simplest model uses only context and answer as inputs. It performs poorly because it doesn't leverage previously decoded tokens, unlike seq2seq models. They therefore propose BERT-HLSQG that marks the answer in context plus feeds in previously decoded answer tokens. This model achieves state-of-the-art results on SQuAD.

Discussion

  • What are some applications of question generation?

    In education, question generation helps in assessing reading comprehension. Students can use it for self-assessment. It can be used to create quizzes and online tests without manual effort from educators. By one estimate, teachers spend 50% of their time towards student assessments.

    Many versions of tests can be created to prevent cheating. For online courses and adaptive learning, these variations are helpful. A solution-oriented approach generates questions based on skills and concepts. A template-based approach uses variables, constraints and templates.

    For training question answering (QA) or dialogue systems, QG can produce question-answer pairs. Chatbots trained on QG can ask relevant questions in an ongoing dialogue.

    Often questions are generated from available answers. But it's possible to generate questions that seek more information or clarification. Such a model can be trained using conversational QA datasets.

    Applications also include help systems and multi-modal conversations involving virtual agents. One research applied QG for user authentication in online systems.

  • What are the different types of questions generated by QG systems?
    Good distractors are needed for MCQs. Source: Susanti et al. 2017, fig. 1.
    Good distractors are needed for MCQs. Source: Susanti et al. 2017, fig. 1.

    Multiple Choice Questions (MCQs) are commonly generated for student assessments. Along with question, the correct answer and a few incorrect answers (called distractors) are generated. For English vocabulary assessment, QG systems could use WordNet, web searches and Word Sense Disambiguation.

    Fill-in-the-blank questions are simpler than MCQs since there's no need to generate distractors. Fill-in-the-blank statements are first generated, from which questions are framed via NLP analysis.

    Wh-questions are questions that ask "what, where, when, which, who, why" and "how". They could also include imperative statements that start with "name, tell, find, define or describe".

    Factoid questions are based on simple facts. Non-factoid questions require deeper analysis of the source text. These could involve causation, inference, reasoning or interpretation. For example, "What colour is the sky?" is a factoid question. "Why is the sky blue?" is an non-factoid question. Similarly, we distinguish between closed questions and open questions. Open questions are of the form "To what extent…", "Why…", "Should…", etc.

  • What are the typical challenges in generating questions?

    QG is quite unlike many NLP tasks. Whereas NLG generates sentences from some semantic input, the input to QG is often in natural language. Unlike machine translation, both input and output in QG are in the same language, and often there's no one-to-one correspondence. QG also significantly reorders and rephrases words.

    Whereas Question Answering (QA) is often extractive (selecting text spans from the input), question generation is often abstractive (generating text not necessarily present in the input). Moreover, a variety of questions can be framed from the many relations among words and phrases in the input. In one example, a single sentence generated 2000+ questions. QG systems must therefore weed out silly questions from important ones.

    Often a generated question might require improvements to its grammar, form or simplicity. QG systems need to figure out long-distance dependencies to do this.

  • What's a typical data pipeline in a question generation system?
    An example pipeline to generate fill-in-the-blank questions. Source: Aditya S 2018.
    An example pipeline to generate fill-in-the-blank questions. Source: Aditya S 2018.

    QG systems can be classified based on the input type. The common ones are text-based QG, visual QG, and structured data-based QG. Structured QG uses knowledge graphs and semi-structured tables, where input is subject-object-relation triplet.

    QG pipelines might depend on the type of input, type of questions and the nature of the model. In general, we expect three steps:

    • Pre-processing: Convert complex sentences into simpler ones, perhaps using parse trees. Using POS, dependency labels and SRL, classify sentences to determine type of question. Select content based on relevance to the final task. From web sources, mine and predict question patterns.
    • Question Construction: Generate correct answers and distractors for MCQs. Select gap position for fill-in-the-blank questions. Transform assertive sentences into interrogative forms. Control the difficulty of questions.
    • Post-processing: Construct the final surface form of questions. Rank questions to prioritize high-quality questions.

    Visual QG systems might detect/classify objects and identify features via CNN (colour, size, shape, relations). Features may be object-specific or on the entire image. These are then used to generate questions.

  • Which are the neural network models for question generation?
    An improved seq2seq model for QG. Source: Zhao et al. 2018, fig. 1.
    An improved seq2seq model for QG. Source: Zhao et al. 2018, fig. 1.

    Du et al. adopted the encoder-decoder architecture of a seq2seq model. Decoder was an LSTM that generated the question. Both an input sentence and its containing paragraph were encoded via separate BiLSTM and then concatenated. Via an attention layer, the decoder learned to pay attention to more relevant parts of the encoded input representation. Encoders and decoder had two layers each.

    Later models included target answer at the input to avoid questions such as "What is mentioned?" Position embeddings may be used to give more attention to answer words closer to context words. Some decoders predict question words (when, how, why, etc.) before generating the question.

    Seq2seq models struggle to capture paragraph-level context that's needed to generate high quality questions. Zhao et al. extended the seq2seq model with answer tagging, maxout pointer mechanism and a gated self-attention encoder. Another approach uses multi-stage attention.

    Transformer-based models also capture longer context. One research pre-processed the input with NER. Other models have adapted BERT and GPT-2 for QG.

  • How is question answering relevant to question generation?

    About 2000-2010, interest in QG was to aid QA systems. Later, motivated by applications, QG became an important task on its own. However, the synergy between QG and QA continues. In 2017, papers were published that showed that QG and QA are dual tasks. Similarly, visual QG and visual QA are considered dual tasks.

    Joint probability between questions and answers can be applied. The probabilistic correlation between QA and QG can be used to regularize the training process. A QA model evaluates if a question generated by a QG model can be answered. A QA-specific signal feeds into the QG loss function. The QG model estimates the probability of a question given the answer, which helps QA. It's possible to simultaneously training both models. Tang et al. used seq2seq model for QG and RNN for QA.

    For supervised training of any end-to-end QA model (using neural networks) we require lots of training data. Where data is limited, QG helps by automatically creating questions from any given input. Duan et al. achieved this by crawling community-QA websites such as Quora or Yahoo!Answers. They also generated questions from passages using CNN and RNN.

  • Which datasets are useful for building QG models?
    QG datasets compared. Source: Pan et al. 2019, table 1.
    QG datasets compared. Source: Pan et al. 2019, table 1.

    Datasets assembled for Question Answering (QA) or Machine Comprehension (MC) can be reused in QG. For example, MC datasets typically have document-question-answer triples with the goal of predicting the answer. QG models are instead trained to predict the question.

    WikiQA is a text-based dataset. WebQuestions and SimpleQuestions are knowledge-based datasets. SQuAD and MS MARCO are for MC. SQuAD has 150K question-answer pairs and another 50K questions without answers. NewsQA has 120K question-answer pairs gathered from CNN news. Based on Freebase, one corpus has 30M factoid question-answer pairs. SciQ has 13.7K MCQs on biology, chemistry, earth science, and physics. Some of these datasets were crowdsourced via Amazon Mechanical Turk (AMT).

    Specifically for QG, QG-STEC dataset was created in 2010. It has 180 questions from sentences and 390 questions from paragraphs. Medical CBQ is a corpus for the medical domain. MCQL has 7.1K MCQs from web crawling on topics of biology, physics, and chemistry.

    For non-factoid questions, community-driven question answering websites could be harvested for a large volume of training data. Yahoo!Answers and Quora are examples.

  • How can I evaluate question generation models?
    Various criteria for human evaluation of QG systems. Source: Amidei et al. 2018, table 7.
    Various criteria for human evaluation of QG systems. Source: Amidei et al. 2018, table 7.

    Intrinsic evaluation analyzes the QG model's output based on criteria such as grammaticality or fluency. Extrinsic evaluation is about measuring system or application performance in which the QG model is used. The trend has been towards automatic and intrinsic evaluation. However, there's no common framework that makes it easy to compare different QG models.

    Suppose students' language proficiency is evaluated from two sets of questions, one machine-generated and the other human-generated. This is extrinsic evaluation. Instead, if experts subjectively evaluate the machine-generated questions, it's intrinsic evaluation.

    With human evaluation, criteria to consider are relevance, syntactic correctness, fluency, ambiguity, question type, and variety. These are quality judgements. It's also important for annotators to agree with one another.

    For automatic evaluation, ROUGE, precision, recall, and F1 are some useful intrinsic metrics. Perplexity based on a language model is a metric to measure fluency. In general, models with high scores could perform poorly on human evaluation. Question answerability may be a better metric compared to n-gram similarity metrics such as BLEU, METEOR, NIST.

References

  1. Aditya S. 2018. "Using Natural Language Processing for Smart Question Generation." Intel AI Developer Program, Intel Software, July 4. Accessed 2020-04-07.
  2. Agarwal, Manish, Rakshit Shah, and Prashanth Mannem. 2011. "Automatic Question Generation using Discourse Cues." Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications, ACL, pp. 1-9, June. Accessed 2020-05-22.
  3. Amidei, Jacopo, Paul Piwek, and Alistair Willis. 2018. "Evaluation methodologies in Automatic Question Generation 2013-2018." Proceedings of the 11th International Conference on Natural Language Generation, ACL, pp. 307-317, November. Accessed 2020-04-07.
  4. Bach, Emmon, and George M. Horn. 1976. "Remarks on Conditions on Transformations." Linguistic Inquiry, The MIT Press, vol. 7, no. 2, pp. 265-99. Accessed 2020-05-21.
  5. Brundan, Matt. 2013. "Research Question Generation." SlideShare, October 9. Accessed 2020-04-07.
  6. Chali, Yllias, and Sadid A. Hasan. 2015. "Towards Topic-to-Question Generation." Computational Linguistics, ACL, vol. 41, no. 1, pp. 1-20, March. Accessed 2020-05-22.
  7. Chan, Ying-Hong, and Yao-Chung Fan. 2019. "A Recurrent BERT-based Model for Question Generation." Proceedings of the 2nd Workshop on Machine Reading for Question Answering, ACL, pp. 154-162, November. Accessed 2020-04-07.
  8. Chen, Wei, Gregory Aist, and Jack Mostow. 2009. "Generating Questions Automatically from Informational Text." Project LISTEN, Carnegie Mellon University. Accessed 2020-05-22.
  9. Chen, Guanliang, Jie Yang, Claudia Hauff, and Geert-Jan Houben. 2018. "LearningQ: A Large-Scale Dataset for Educational Question Generation." The 12th International AAAI Conference on Web and Social Media (ICWSM 2018), pp. 481-490, June 25-28. Accessed 2020-04-07.
  10. Du, Xinya, Junru Shao, and Claire Cardie. 2017. "Learning to Ask: Neural Question Generation for Reading Comprehension." Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1342-1352, July. Accessed 2020-04-07.
  11. Duan, Nan, Duyu Tang, Peng Chen, and Ming Zhou. 2017. "Question Generation for Question Answering." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 866-874, September. Accessed 2020-04-07.
  12. Echihabi, Abdessamad, and Daniel Marcu. 2003. "A Noisy-Channel Approach to Question Answering." Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 16-23, July. Accessed 2020-05-21.
  13. Goldman, Jonathan. 2013. "Algorithmically Generating Questions." Knewton, on Medium, September 19. Accessed 2020-04-07.
  14. Heilman, Michael, and Noah A. Smith. 2010. "Good Question! Statistical Ranking for Question Generation." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ACM, pp. 609-617, June. Accessed 2020-05-21.
  15. Hosking, Tom, and Sebastian Riedel. 2019. "Evaluating Rewards for Question Generation Models." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), ACL, pp. 2278-2283, v2, June. Accessed 2020-04-07.
  16. Kriangchaivech, Kettip, and Artit Wangperawong. 2019. "Question Generation by Transformers." arXiv, v2, September 14. Accessed 2020-04-07.
  17. Kurdi, Ghader, Jared Leo, Bijan Parsia, Uli Sattler, and Salam Al-Emari. 2020. "A Systematic Review of Automatic Question Generation for Educational Purposes." International Journal of Artificial Intelligence in Education, vol. 30, pp. 121-204, March. Accessed 2020-04-07.
  18. Lewis, Patrick, Ludovic Denoyer, and Sebastian Riedel. 2019. "Unsupervised Question Answering by Cloze Translation." arXiv, v2, June 27. Accessed 2020-04-07.
  19. Li, Yikang, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, and Xiaogang Wang. 2017. "Visual Question Generation as Dual Task of Visual Question Answering." arXiv, v1, September 21. Accessed 2020-05-22.
  20. Lindberg, David, Fred Popowich, John Nesbit, and Phil Winne. 2013. "Generating Natural Language Questions to Support Learning On-Line." Proceedings of the 14th European Workshop on Natural Language Generation, ACL, pp. 105-114, August. Accessed 2020-05-22.
  21. Lopez, Luis Enrico, Diane Kathryn Cruz, Jan Christian Blaise Cruz, and Charibeth Cheng. 2020. "Transformer-based End-to-End Question Generation." arXiv, v1, May 3. Accessed 2020-05-23.
  22. Microsoft Research. 2018. "Question Generation (QG)." Microsoft Research, March 13. Accessed 2020-04-07.
  23. Mitkov, Ruslan, and Le An Ha. 2003. "Computer-Aided Generation of Multiple-Choice Tests." Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing, pp. 17-22. Accessed 2020-05-21.
  24. Nema, Preksha, and Mitesh M. Khapra. 2018. "Towards a Better Metric for Evaluating Question Generation Systems." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 3950-3959, October-November. Accessed 2020-04-07.
  25. Pan, Liangming, Wenqiang Lei, Tat-Seng Chua, and Min-Yen Kan. 2019. "Recent Advances in Neural Question Generation." arXiv, v3, June 4. Accessed 2020-04-07.
  26. Rus, Vasile, Brendan Wyse, Paul Piwek, Mihai Lintean, Svetlana Stoyanchev, and Christian Moldovan. 2010. "The First Question Generation Shared Task Evaluation Challenge." Proceedings of the 6th International Natural Language Generation Conference. Accessed 2020-04-07.
  27. Scialom, Thomas, and Jacopo Staiano. 2019. "Ask to Learn: A Study on Curiosity-driven Question Generation." arXiv, v1, November 8. Accessed 2020-04-07.
  28. Surdeanu, Mihai, Massimiliano Ciaramita, and Hugo Zaragoza. 2011. "Learning to Rank Answers to Non-Factoid Questions from Web Collections." Computational Linguistics, vol. 37, no. 2, pp. 351-383, June. Accessed 2020-05-21.
  29. Susanti, Yuni, Takenobu Tokunaga, Hitoshi Nishikawa, and Hiroyuki Obari. 2017. "Evaluation of automatically generated English vocabulary questions." Research and Practice in Technology Enhanced Learning, vol. 12, article no. 11, March. doi:10.1186/s41039-017-0051-y. Accessed 2020-04-07.
  30. Tang, Duyu, Nan Duan, Tao Qin, Zhao Yan, and Ming Zhou. 2017. "Question Answering and Question Generation as Dual Tasks." arXiv, v2, August 4. Accessed 2020-04-07.
  31. Tang, Duyu, Nan Duan, Zhao Yan, Zhirui Zhang, Yibo Sun, Shujie Liu, Yuanhua Lv, and Ming Zhou. 2018. "Learning to Collaborate for Question Answering and Asking." Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1564-1574, June. Accessed 2020-04-07.
  32. Tuan, Luu Anh, Darsh J Shah, and Regina Barzilay. 2019. "Capturing Greater Context for Question Generation." arXiv, v1, October 22. Accessed 2020-04-07.
  33. Uehara, Kohei, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, and Tatsuya Harada. 2018. "Visual Question Generation for Class Acquisition of Unknown Objects." arXiv, v1, August 6. Accessed 2020-05-22.
  34. Vanderwende, Lucy. 2008. "The Importance of Being Important: Question Generation." Workshop on the Question Generation Shared Task and Evaluation Challenge, September. Accessed 2020-05-21.
  35. Woo, Simon S., Zuyao Li, and Jelena Mirkovic. 2016. "Good Automatic Authentication Question Generation." Proceedings of the 9th International Natural Language Generation Conference, ACL, pp. 203-206, September 5-8. Accessed 2020-05-22.
  36. Yuan, Xingdi, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, and Adam Trischler. 2017. "Machine Comprehension by Text-to-Text Neural Question Generation." arXiv, v2, May 15. Accessed 2020-04-07.
  37. Zhao, Yao, Xiaochuan Ni, Yuanyuan Ding, and Qifa Ke. 2018. "Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 3901-3910, October-November. Accessed 2020-04-07.
  38. Zhou, Qingyu, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, and Ming Zhou. 2017. "Neural Question Generation from Text: A Preliminary Study." arXiv, v3, April 18. Accessed 2020-05-21.
  39. Zukerman, Ingrid, and Eric Horvitz. 2001. "Using Machine Learning Techniques to Interpret WH-questions." Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 547-554, July. Accessed 2020-05-21.

Milestones

1973
Chomsky's Subject Condition with examples. Source: Bach and Horn 1976, sec. 4.

In an article titled Conditions on transformations Chomsky proposes a linguistic approach in which questions are really transformations of canonical declarative sentences.

May
2003

Mitkov and Ha generate multiple choice questions using "transformational rules, a shallow parser, automatic term extraction, word sense disambiguation, a corpus and WordNet". Sentences with domain-specific terms are considered for question generation. Distractors are selected to be semantically close to the correct answer. For example, if the correct answer is 'syntax', 'semantics' and 'pragmatics' are better choices than 'football' or 'chemistry'. An example rule is to generate "Which HVO" (H for hypernym) for an SV-type sentence.

Jul
2003

Echihabi and Marcu generate factoid question-answer pairs by mining structured or semi-structured databases such as World Fact Book, Biography.com or WordNet. They apply extraction patterns from information retrieval to manually define template pairs. However, their end goal is not question generation but question answering. In fact, research on question generation in this decade is mostly motivated by the task of question answering.

2009
QG differs for narrative and informational texts. Source: Chen et al. 2009.

Chen et al. note that self-questioning improves reading comprehension among learners. In this context, they automatically generate questions from natural text. However, narrative text is different from informational text. The former involves characters, behaviour and mental states. The latter involves descriptions and explanations. For each question type, they use rules and question templates. Their approach uses Semantic Role Labelling (SRL). Lindberg et al. (2013) also use SRL and templates.

Jun
2010
Steps in a rule-based QG system. Source: Heilman and Smith 2010, fig. 1.

Heilman and Smith propose a rule-based QG system for generating factoid questions. Rules are based on linguistic knowledge. Rules are generic and not dependent on sentence types. Their two-step process first simplifies input sentence and then transforms it into a question. Answer phrases (noun phrases or prepositional phrases) are identified and then replaced with question phrases. They over-generate questions and then rank them using a logistic regression model.

Jul
2010

Following the annual tasks run by CoNLL, the first Shared Task Evaluation Challenge on Question Generation (QG-STEC) is organized. Although inspired by NLG, researchers note that QG is currently seen as a discourse processing task rather than an NLG task. This shared task aims to provide questions in an application-independent manner based on core ideas in the input text. They focus on two categories: QG from sentences and QG from paragraphs. For the latter, questions are of type "who, where, when, which, what, why, how many/long, yes/no." Input sources are Wikipedia, OpenLearn, and Yahoo!Answers.

2011

While previous research work mostly considered sentence-level context, Agarwal et al. consider paragraph-level context. Discourse connectives connect two clauses or sentences to establish temporal, causal, elaboration, contrast, or result relations. They make use of these connectives to identify question types and generate questions of the type "why, when, give an example, and yes/no". They use QGSTEC-2010 and Wikipedia datasets.

2015
Architecture of the topic-to-question generation system. Source: Chali and Hasan 2015, fig. 1.

Chali and Hasan propose topic-to-question generation in which input texts are about a specific topic. Generated questions are therefore topic-focused. Named entities, semantic role labels and predicate argument structures are used to generate questions. Using Latent Dirichlet Allocation (LDA), they identify sub-topics to rank questions by relevance. They also evaluate the syntactic correctness of generated questions. They argue that this approach enables a QA system to answer a complex question from simpler questions.

2017
Feature-rich encoder and attention-based decoder in a seq2seq QG model. Source: Zhou et al. 2017, fig. 1.

Sequence-to-sequence neural network models with attention, first applied to machine translation in 2014, is applied to the task of question generation. In one approach using BiLSTM, attention-based sentence encoder and paragraph encoder are used. Input could include rich features such as POS and NER tags. Another research includes pointer-softmax to include in the question relevant words from the input. The model considers both input document and answers. It also applies reinforcement learning by feeding the questions to a QA system.

Jun
2018
Joint training of QA and QG. Source: Tang et al. 2018, fig. 1.

Tang et al. explore how QG and QA models can be jointly trained. In particular, policy gradient method can be used to update the QG model based on QA-specific signals. They apply Generative Adversarial Network (GAN) to generate question-answer pairs. A collaboration detector (CD) determines positive versus negative training instances. In their QG model, they adapt seq2seq model to Table2Seq. Table headers, cells, and caption are encoded into continuous vectors using Bidirectional GRU. Decoder uses attention-based GRU with copying mechanism.

Jun
2018
LearningQ contains better training examples. Source: Chen et al. 2018, table 1.

Chen et al. note that Stanford Question Answering Dataset (SQuAD) and RACE datasets were collected for reading comprehension and not suited for assessing higher cognitive skills such as applying or analyzing. They address this by creating LearningQ, a dataset of 230K document-question pairs from online learning platforms. Dataset uses TED-Ed and Khan Academy as sources. They show that QG models that currently do well on other datasets are challenged by LearningQ.

Nov
2019
Architecture of BERT-HLSQG. Source: Chan and Fan 2019, fig. 4.

Chan and Fan use BERT for QG. Their simplest model uses only context and answer as inputs. It performs poorly because it doesn't leverage previously decoded tokens, unlike seq2seq models. They therefore propose BERT-HLSQG that marks the answer in context plus feeds in previously decoded answer tokens. This model achieves state-of-the-art results on SQuAD.

Tags

See Also

Further Reading

  1. Pan, Liangming, Wenqiang Lei, Tat-Seng Chua, and Min-Yen Kan. 2019. "Recent Advances in Neural Question Generation." arXiv, v3, June 4. Accessed 2020-04-07.
  2. Agarwal, Manish, Rakshit Shah, and Prashanth Mannem. 2011. "Automatic Question Generation using Discourse Cues." Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications, ACL, pp. 1-9, June. Accessed 2020-05-22.
  3. Tuan, Luu Anh, Darsh J Shah, Regina Barzilay. 2019. "Capturing Greater Context for Question Generation." arXiv, v1, October 22. Accessed 2020-04-07.
  4. Scialom, Thomas, and Jacopo Staiano. 2019. "Ask to Learn: A Study on Curiosity-driven Question Generation." arXiv, v1, November 8. Accessed 2020-04-07.
  5. Lopez, Luis Enrico, Diane Kathryn Cruz, Jan Christian Blaise Cruz, and Charibeth Cheng. 2020. "Transformer-based End-to-End Question Generation." arXiv, v1, May 3. Accessed 2020-05-23.
  6. Zhao, Yao, Xiaochuan Ni, Yuanyuan Ding, and Qifa Ke. 2018. "Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 3901-3910, October-November. Accessed 2020-04-07.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
0
1707
2515
Words
0
Chats
3
Edits
0
Likes
125
Hits

Cite As

Devopedia. 2020. "Question Generation." Version 3, May 23. Accessed 2020-05-25. https://devopedia.org/question-generation