Grammar and Spell Checker

Summary of approaches to Grammar Error Correction (GEC). Source: Source: Adapted from Ailani et al. 2019, figs. 1-4.
Summary of approaches to Grammar Error Correction (GEC). Source: Source: Adapted from Ailani et al. 2019, figs. 1-4.

A well-written article with correct grammar, punctuation and spelling along with an appropriate tone and style to match the needs of the intended reader or community is always important. Software tools offer algorithm-based solutions for grammar and spell checking and correction.

Classical rule-based approaches employ a dictionary of words along with a set of rules. Recent neural network-based approaches learn from millions of published articles and offer suggestions for appropriate choice of words and way to phrase parts of sentences to adjust the tone, style and semantics of the sentence. They can alter suggestions based on the publication domain of the article like academic, news, etc.

Grammar and spelling correction are tasks that belong to a more general NLP process called lexical disambiguation.

Discussion

  • What is a software grammar and spell checker, its general tasks and uses?
    Illustrating grammar and spell checks and suggested corrections. Source: Devopedia 2021.
    Illustrating grammar and spell checks and suggested corrections. Source: Devopedia 2021.

    A grammar and spell checker is a software tool that checks a written text for grammatical mistakes, appropriate punctuation, misspellings, and issues related to sentence structure. More recently, neural network-based tools also evaluate tone, style, and semantics to ensure that the writing is flawless.

    Often such tools offer a visual indication by highlighting or underlining spelling and grammar errors in different colors (often red for spelling and blue for grammar). Upon hovering or clicking on the highlighted parts, they offer appropriately ranked suggestions to correct those errors. Certain tools offer a suggestive corrected version by displaying correction as strikeout in an appropriate color.

    Such tools are used to improve writing, produce engaging content, and for assessment and training purposes. Several tools also offer style correction to adapt the article for specific domains like academic publications, marketing, and advertising, legal, news reporting, etc.

    However, till today, no tool is a perfect alternative to an expert human evaluator.

  • What are some important terms relevant to a grammar and spell checker?

    The following NLP terms and approaches are relevant to grammar and spell checker:

    • Part-of-Speech (PoS) tagging marks words as noun, verb, adverb, etc. based on definition and context.
    • Named Entity Recognition (NER) is labeling a sequence of text into predefined categories such as name, location, etc. Labels help determine the context of words around them.
    • Confusion Set is a set of probable words that can appear in a certain context, e.g. set of articles before a noun.
    • N-Gram is a sub-sequence of n words or tokens. For example, "The sun is bright" has these 2-grams: {"the sun", "sun is", "is bright"}.
    • Parallel Corpus is a collection of text placed alongside its translation, e.g. text with errors and its corresponding corrected version(s).
    • Language Model (LM) determines the probability distribution over a sequence of words. It says how likely is a particular sequence of words.
    • Machine Translation (MT) is a software approach to translate one sequence of text into another. In grammar checking, this refers to translating erroneous text into correct text.
  • What are the various types of grammar and spelling errors?
    Types of grammar and spelling errors. Source: Soni and Thakur 2018, fig. 3.
    Types of grammar and spelling errors. Source: Soni and Thakur 2018, fig. 3.

    We describe the following types:

    • Sentence Structure: Parts of speech are organized incorrectly. For example, "she began to singing" shows misplaced 'to' or '-ing'. Dependent clause without the main clause, run-on sentence due to missing conjunction, or missing subject are some structural errors.
    • Syntax Error: Violation of rules of grammar. These can be in relation to subject-verb agreement, wrong/missing article or preposition, verb tense or verb form error, or a noun number error.
    • Punctuation Error: Punctuation marks like comma, semi-colon, period, exclamation, question mark, etc. are missing, unnecessary, or wrongly placed.
    • Spelling Error: Word is not known in the dictionary.
    • Semantic Error: Grammar rules are followed but the sentence doesn't make sense, often due to a wrong choice of words. "I am going to the library to buy a book" is an example where 'bookstore' should replace 'library'. Rule-based approaches typically can't handle semantic errors. They require statistical or machine learning approaches, which can also flag other types of errors. Often a combination of approaches leads to a good solution.
  • What are classical methods for implementing grammar and spell checkers?
    The noisy channel model. Source: Jurafsky 2019.

    Classical methods of spelling correction match words against a given dictionary, an approach alluded by critiques to be unreliable as it can't detect incorrect use of correctly spelled words; or correct words not in the dictionary, like technical words, acronyms, etc.

    Grammar checkers use hand-coded grammar rules on PoS tagged text for correct or incorrect sentences. For instance, the rule I + Verb (3rd person, singular form) corresponds to the incorrect verb form usage, as in the phrase "I has a dog." These methods provide detailed explanations of flagged errors making it helpful for learning. However, rule maintenance is tedious and devoid of context.

    Statistical approaches validate parts of a sentence (n-grams) against their presence in a corpus. These approaches can flag words used out of context. However, it's challenging to provide detailed explanations. Their efficiency is limited to the choice of corpora.

    Noisy channel model is one statistical approach. A LM based on trigrams and bigrams gives better results than just unigrams. Where rare words are wrongly corrected, using a blacklist of words or a probability threshold can help.

  • What are Machine Learning-based methods for implementing grammar and spell checkers?

    ML-based approaches are either Classification (discriminative) or Machine Translation (generative).

    Classification approaches work with well-defined errors. Each error type (article, preposition, etc.) requires training a separate multi-class classifier. For example, a proposition error classifier takes n-grams associated with propositions in a sentence and outputs a score for every candidate proposition in the confusion set. Contextual corrections also consider features like PoS and NER. A model can be a linear classifier like a Support Vector Machine (SVM), an n-gram LM-based or Naïve Bayes classifier, or even a DNN-based classifier.

    Machine Translation approaches can be Statistical Machine Translation (SMT) or Neural Machine Translation (NMT). Both these use parallel corpora to train a sequence-to-sequence model, where text with errors translates to corrected text. NMT uses encoder-decoder architecture, where an encoder determines a latent vector for a sentence based upon the input word embeddings. The decoder then generates target tokens from the latent vector and relevant surrounding input and output tokens (attention). These benefit from transfer learning and advancements in transformer-based architecture. Editor models reduce training time by outputting edits to input tokens from a reduced confusion set instead of generating target tokens.

  • How can I train an NMT model for grammar and spell checking?
    Training an NMT for GEC. Source: Adapted from Naghshnejad et al. 2020, fig. 3, fig. 5, table 4.
    Training an NMT for GEC. Source: Adapted from Naghshnejad et al. 2020, fig. 3, fig. 5, table 4.

    In general, NMT requires training an encoder-decoder model using cross-entropy as the loss function by comparing maximum likelihood output to the gold standard correct output. To train a good model requires a large number of parallel corpora and compute capacity. Transformers are attention-based deep seq2seq architectures. Pre-trained language models generated by transformer architectures like BERT provide contextual embeddings to find the most likely token given the surrounding tokens, making it useful to flag contextual errors in an n-gram.

    Transfer learning via fine tuning weights of a transformer using the parallel corpus of incorrect to correct examples makes it suitable for GEC use. Pre-processing or pre-training with synthetic data improves the performance and accuracy. Further enhancements can be to use separate heads for different types of errors.

    Editor models are better as they output edit sequences instead of corrected versions. Training and testing of editor models require the generation of edit sequences from source-target parallel texts.

  • What datasets are available for training and evaluation of grammar and spell check models?

    MT or classification models need datasets with annotated errors. NMT requires a large amount of data.

    Lang 8, the largest available parallel corpora, has 100,051 English entries. Corpus of Linguistic Acceptability (CoLA) is a dataset of sentences labeled as either grammatically correct or incorrect. It can be used, for example, to fine tune a pre-trained model. GitHub Typo Corpus is harvested from GitHub and contains errors and their corrections.

    Benchmarking data in Standard Generalized Markup Language (SGML) format is available. Sebastian Ruder offers a detailed list of available benchmarking test datasets along with the various models (publications and source code).

    Noise models use transducers to produce erroneous sentences from correct ones with a specified probability. They induce various error types to generate a larger dataset from a smaller one, like replacing a word from its confusion set, misplace or remove punctuations, induce spelling, tense, noun number, or verb form mistakes, etc. Round-trip MT, such as English-German-English translation, can also generate parallel corpora. Wikipedia edit sequences offer millions of consecutive snapshots to serve as source-target pairs. However, only a tiny fraction of those edits are language related.

  • How do I annotate or evaluate the performance of grammar and spell checkers?

    ERRor ANnotation Toolkit (ERRANT) enabled suggestions with explanation. It automatically annotates parallel English sentences with error type information, thereby standardizing parallel datasets and facilitating detailed error type evaluation.

    Training and evaluation require comparing the output to the target gold standard and giving a numerical measure of effectiveness or loss. Editor models have an advantage as the sequence length of input and output is the same. Unequal sequences need alignment with the insertion of empty tokens.

    Max-Match (\(M^2\)) scorer determine the smallest edit sequence out of the multiple possible ways to arrive at the gold standard using the notion of Levenshtein distance. The evaluation happens by computing precision, recall, and F1 measure between the set of system edits and the set of gold edits for all sentences after aligning the sequences to the same length.

    Dynamic programming can also align multiple sequences to the gold standard when there is more than one possible correct outcome.

  • Could you mention some tools or libraries that implement grammar and spell checking?

    GNU Aspell is a standard utility used in GNU OS and other UNIX-like OS. Hunspell is a spell checker that's part of popular software such as LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, Google Chrome, and more. Hunspell itself is based on MySpell. Hunspell can use one or more dictionaries, stemming, morphological analysis, and Unicode text.

    Python packages for spell checking include pyspellchecker, textblob and autocorrect.

    A search for "grammar spell" on GitHub brings up useful dictionaries or code implemented in various languages. There's a converter from British to American English. Spellcheckr is a JavaScript implementation for web frontends.

    Deep learning models include Textly-DRF-API and GECwBERT.

    Many online services or offline software also exist: WhiteSmoke from 2002, LanguageTool from 2005, Grammarly from 2009, Ginger from 2011, Reverso from 2013, and Trinka from 2020. Trinka focuses on an academic style of writing. Grammarly focuses on suggestions in terms of writing style, clarity, engagement, delivery, etc.

Milestones

1960
Abbreviation ABBT maps incorrect word 'absorbant' to the correct word 'absorbent'. Source: Blair 1960.
Abbreviation ABBT maps incorrect word 'absorbant' to the correct word 'absorbent'. Source: Blair 1960.

Blair implements a simple spelling corrector using heuristics and a dictionary of correct words. Incorrect spellings are associated with the corrected ones via abbreviations that indicate similarity between the two. Blair notes that this is in some sense a form of pattern recognition. In one experiment, the program successfully corrects 89 of 117 misspelled words. In general, research interest in spell checking and correction begins in the 1960s.

1971

R. E. Gorin writes Ispell in PDP-10 assembly. Ispell becomes the main spell-checking program for UNIX. Ispell is also credited with introducing the generalized affix description system. Much later, Geoff Kuenning implements a C++ version with support for many European languages. This is called International Ispell. GNU Aspell, MySpell and Hunspell are other software inspired by Ispell.

1980
Evolution of GEC. Source: Naghshnejad et al. 2020, fig 1.
Evolution of GEC. Source: Naghshnejad et al. 2020, fig 1.

In the 1980s, GEC systems are syntax-based systems, such as EPISTLE. They determine the syntactic structure of each sentence and the grammatical functions fulfilled by various phrases. They detect several classes of grammatical errors, such as disagreement in number between the subject and the verb.

1990

This decade focuses on simple linear classifiers to flag incorrect choice of articles or statistical methods to identify and flag use of commonly confused words. Confusion can be due to identical sounding words, typos etc.

2000

Rule-based methods evolve in the 2000s. Rule generation is based on parse trees, designed heuristically or based on linguistic knowledge or statistical analysis of erratic texts. These methods don't generalize to new types of errors. New rules need to be constantly added.

2005

The mid-2000s sees methods to record and create aligned corpora of pre- and post-editing ESL (English as a Second Language) writing samples. SMTs offer improvement in identifying and correcting writing errors. GEC sees the use of semantic and syntactic features including PoS tags and NER information for determining the applicable correction. Support Vector Machines (SVMs), n-gram LM-based and Naïve Bayes classifiers are used to predict the potential correction.

2010

DNN-based classifier approaches are proposed in 2000s and early 2010s. However, a specific set of error types have to be defined. Typically only well-defined errors can be addressed with these approaches. SMT models learn mappings from source text to target text using a noisy channel model. SMT-based GEC models use parallel corpora of erratic text and grammatically correct version of the same text in the same language. Open-source SMT engines are available online and include Moses, Joshua and cdec.

2016

Neural Machine Translation (NMT) shows better prospects by capturing some learner errors missed by SMT models. This is because NMT can encode structural patterns from training data and is more likely to capture an unseen error.

2018

With the advent of attention-based transformer architecture in 2017, its application to GEC gives promising results.

2019

Methods to improve the training data by text augmentation of various types, including cyclic machine translation, emerge. These improve the performance of GEC tools significantly and enable better flagging of style or context-based errors or suggestions. Predicting edits instead of tokens allows the model to pick the output from a smaller confusion set. Thus, editor models lead to faster training and inference of GEC models.

Sample Code

  • # Source: https://norvig.com/spell-correct.html
    # Accessed 2021-04-25
     
    # This is Peter Norvig's implementation from 2007.
    # It relies on big.txt, a file of about a million words.
     
    import re
    from collections import Counter
     
    def words(text): return re.findall(r'\w+', text.lower())
     
    WORDS = Counter(words(open('big.txt').read()))
     
    def P(word, N=sum(WORDS.values())): 
        "Probability of `word`."
        return WORDS[word] / N
     
    def correction(word): 
        "Most probable spelling correction for word."
        return max(candidates(word), key=P)
     
    def candidates(word): 
        "Generate possible spelling corrections for word."
        return (known([word]) or known(edits1(word)) or known(edits2(word)) or [word])
     
    def known(words): 
        "The subset of `words` that appear in the dictionary of WORDS."
        return set(w for w in words if w in WORDS)
     
    def edits1(word):
        "All edits that are one edit away from `word`."
        letters    = 'abcdefghijklmnopqrstuvwxyz'
        splits     = [(word[:i], word[i:])    for i in range(len(word) + 1)]
        deletes    = [L + R[1:]               for L, R in splits if R]
        transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
        replaces   = [L + c + R[1:]           for L, R in splits if R for c in letters]
        inserts    = [L + c + R               for L, R in splits for c in letters]
        return set(deletes + transposes + replaces + inserts)
     
    def edits2(word): 
        "All edits that are two edits away from `word`."
        return (e2 for e1 in edits1(word) for e2 in edits1(e1))
     
    # usage:
    correction('speling') # spelling (single deletion)
    correction('korrectud') # corrected (double replacements)
     

References

  1. Ailani, Sagar, Ashwini Dalvi, and Irfan Siddavatam. 2019. "Grammatical Error Correction (GEC): Research Approaches till now." International Journal of Computer Applications, vol. 178, no. 40, August. Accessed 2021-04-22.
  2. Atkinson, Kevin. 2018. "GNU Aspell." Accessed 2021-04-27.
  3. Baca, Marie C. 2019. "People do grammar bad. Google’s AI is hear too help." The Washington Post, August 26. Accessed 2021-04-22.
  4. Bergsma, Shane, Dekang Lin, and Randy Goebel. 2009. "Web-scale N-gram models for lexical disambiguation." IJCAI'09: Proceedings of the 21st international joint conference on Artificial intelligence, pp. 1507-1512, July. doi: 10.5555/1661445.1661687. Accessed 2021-04-25.
  5. Blair, Charles R. 1960. "A program for correcting spelling errors." Information and Control, vol. 3, no. 1, pp. 60-67. Accessed 2021-04-27.
  6. Brockett, Chris, William B. Dolan, and Michael Gamon. 2006. "Correcting ESL errors using phrasal SMT techniques." ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational LinguisticsJuly, pp. 249–256, July. Accessed 2021-04-26.
  7. Bryant, Christopher, Mariano Felice, and Ted Briscoe. 2017. "Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction." Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 793-805, July 30 - August 4. Accessed 2021-04-24.
  8. Choe, Yo Joong, Jiyeon Ham, Kyubyong Park, and Yeoil Yoon. 2019. "A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning." Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 213-227, August 2. Accessed 2021-04-24.
  9. Chomal, Sunil. 2019. "GECwBERT." sunilchomal/GECwBERT, on GitHub, October 22. Accessed 2021-04-25.
  10. Dahlmeier, Daniel, and Hwee Tou Ng. 2012. "Better Evaluation for Grammatical Error Correction." Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 568-572, June 3-8. Accessed 2021-04-24.
  11. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. "BERT: Pre-training of deep bidirectional transformers for language understanding." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171-4186, June 2-7. Accessed 2021-04-24.
  12. Dyer, Chris, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. 2010. "cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models." Proceedings of the ACL 2010 System Demonstrations, pp. 7–12, Uppsala, Sweden. Accessed 2021-04-26.
  13. Encyclopædia Britannica. 2017. "Spelling and grammar checkers." Encyclopædia Britannica, November 22. Accessed 2021-04-22.
  14. Felice, Mariano, and Ted Briscoe. 2015. "Towards a standard evaluation method for grammatical error detection and correction." Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, pp. 578-587, May 31 - June 5. Accessed 2021-04-24.
  15. GitHub. 2021. "Search for 'grammar spell'." GitHub. Accessed 2021-04-25.
  16. Grundkiewicz, Roman, and Marcin Junczys-Dowmunt. 2014. "The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and its Application to Grammatical Error Correction." In: Advances in Natural Language Processing -- Lecture Notes in Computer Science, Adam Przepiórkowski and Maciej Ogrodniczuk (eds), Springer, vol. 8686, pp. 478-490. Accessed 2021-04-23.
  17. Grundkiewicz, Roman, Marcin Junczys-Dowmunt, and Kenneth Heafield. 2019. "Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data." Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 252-263, August 2. Accessed 2021-04-24.
  18. Hagiwara, Masato. 2019. "GitHub Typo Corpus." mhagiwara/github-typo-corpus, on GitHub, December 11. Accessed 2021-04-25.
  19. Hunspell GitHub. 2020. "Hunspell." hunspell/hunspell, on GitHub, May 19. Accessed 2021-04-25.
  20. Junczys-Dowmunt, Marcin, Roman Grundkiewicz, Shubha Guha, and Kenneth Heafield. 2018. "Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task." Proceedings of NAACL-HLT 2018, pp. 595–606 New Orleans, Louisiana, June 1 - 6. Accessed 2021-04-26.
  21. Jurafsky, Daniel. 2019. "The Noisy Channel Model of Spelling." Stanford University, via Sichen Jin on YouTube, April 10. Accessed 2022-08-23.
  22. Jurafsky, Daniel and James H. Martin. 2020. "Spelling Correction and theNoisy Channel." Chapter B in: Speech and Language Processing, Draft for Third Edition, December 30. Accessed 2021-04-25.
  23. Knight, Kevin, and Ishwar Chander. 1994. "Automated Postediting of Documents." Proceedings of the 12th National Conference on Artificial Intelligence (AAAI94), Seattle, WA, pp. 779-784. Accessed 2021-04-26.
  24. Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. "Moses: Open Source Toolkit for Statistical Machine Translation." Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 177–180. Accessed 2021-04-26.
  25. Kuenning, Geoff. 2021. "International Ispell." Accessed 2021-04-27.
  26. Li, Zhifei, Chris Callison-Burch, Chris Dyer, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Jonathan Weese, and Omar Zaidan. 2009. "Joshua: An Open Source Toolkit for Parsing-based Machine Translation." Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 135–139, Athens, Greece. Accessed 2021-04-26.
  27. Lichtarge, Jared, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, and Simon Tong. 2019. "Corpora Generation for Grammatical Error Correction." Proceedings of NAACL-HLT, pp. 3291-3301, June 2-7. Accessed 2021-04-23.
  28. Malmi, Eric, Sebastian Krause, and Sascha Rothe. 2019. "Encode, Tag, Realize: High-Precision Text Editing." Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 5054–5065, November 3–7. Accessed 2021-04-26.
  29. Miller, Lance A., George E. Heidorn, and Karen Jensen. 1981. "Text-critiquing with the EPISTLE system: an author's aid to better syntax." AFIPS '81: Proceedings of the National Computer Conference, pp. 649–655, May 4-7. Accessed 2021-04-26.
  30. Mitton, Roger. 1996. "Spellchecking by computer." Journal of the Simplified Spelling Society, vol. 20, no., 1, pp. 4-11. Accessed 2021-04-27.
  31. Mizumoto, Tomoya, Yuta Hayashibe, Mamoru Komachi, Masaaki Nagata, and Yu ji Matsumoto. 2012. "The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings." COLING 2012, 24th International Conference on Computational Linguistics, pp. 863-872, December. Accessed 2021-04-23.
  32. Mizumoto, Tomoya, Toshikazu Tajiri, Takuya Fujino, Seiji Kasahara, Mamoru Komachi, Masaaki Nagata, and Yuji Matsumoto. 2021. "NAIST Lang-8 Learner Corpora." Google Sites. Accessed 2021-04-23.
  33. Naber, Daniel. 2003. "A Rule-Based Style and Grammar Checker." Thesis, Diplomarbeit Technis Fakultät, Universität Bielefeld, Germany, August 28. Accessed 2021-04-26.
  34. Naghshnejad, Mina, Tarun Joshi, and Vijayan N. Nair. 2020. "Recent Trends in the Use of Deep Learning Models for Grammar Error Handling." arXiv, v1, September 4. Accessed 2021-04-22.
  35. Omelianchuk, Kostiantyn, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. 2020. "GECToR – Grammatical Error Correction: Tag, Not Rewrite." Published in: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, via arXiv, v2, May 29. Accessed 2021-04-23.
  36. Park, Junhee. 2019. "An AI-based English Grammar Checker vs. Human Raters in Evaluating EFL Learners’ Writing." Multimedia-Assisted Language Learning, vol. 22, no. 1, pp. 112-131. doi: 10.15702/mall.2019.22.1.112. Accessed 2021-04-22.
  37. Park, Y. Albert, and Roger Levy. 2011. "Automated Whole Sentence Grammar Correction Using a Noisy Channel Model." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 934-944, June 19-24. Accessed 2021-04-23.
  38. Powers, David M. W. 1997. "Learning and Application of Differential Grammars." T.M. Ellison (ed.) CoNLL97: Computational Natural Language Learning, ACL, pp. 88-96. Accessed 2021-04-26.
  39. Rozovskaya, Alla, and Dan Roth. 2014. "Building a State-of-the-Art Grammatical Error Correction System." Transactions of the Association for Computational Linguistics, vol. 2, pp. 419–434. Accessed 2021-04-26.
  40. Ruder, Sebastian. 2021. "Grammatical Error Correction." NLP-progress. Accessed 2021-04-23.
  41. Shaptala, Julia, and Bohdan Didenko. 2019. "Multi-headed Architecture Based on BERT for Grammatical Errors Correction." Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 246-251, August 2. Accessed 2021-04-24.
  42. Soni, Madhvi, and Jitendra Singh Thakur. 2018. "A Systematic Review of Automated Grammar Checking in English Language." arXiv, v1, March 29. Accessed 2021-04-23.
  43. Treadway, Andrew. 2019. "3 Packages to Build a Spell Checker in Python." Open Source Automation, December 10. Accessed 2021-04-25.
  44. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. "Attention Is All You Need." 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. Accessed 2021-04-26.
  45. Wikipedia. 2021. "Standard Generalized Markup Language." Wikipedia, April 9. Accessed 2021-04-23.
  46. Wikipedia. 2021b. "GNU Aspell." Wikipedia, February 1. Accessed 2021-04-25.
  47. Wikipedia. 2021c. "Ispell." Wikipedia, April 19. Accessed 2021-04-27.
  48. Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. "Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation." arXiv, v2, October 8. Accessed 2021-04-26.
  49. mitya33. 2020. "Spellcheckr." mitya33/spellcheckr, on GitHub, November 11. Accessed 2021-04-25.

Further Reading

  1. Kukich, Karen. 1992. "Techniques for automatically correcting words in text." ACM Computing Surveys, vol. 24, no. 4, December. doi: 10.1145/146370.146380Accessed 2021-04-27.
  2. LingPipe. 2021. "Spelling Tutorial." Tutorial, LingPipe. Accessed 2021-04-25.
  3. Keil, Jackson. 2020. "Advantages and Disadvantages of Grammar Checker." Books Charming, March 1. Accessed 2021-04-22.
  4. Tanwar, Ravi. 2020. "How To Use BERT Transformer For Grammar Checking?" Analytics India Magazine. Accessed 2021-04-24.
  5. Naghshnejad, Mina, Tarun Joshi, and Vijayan N. Nair. 2020. "Recent Trends in the Use of Deep Learning Models for Grammar Error Handling." arXiv, v1, September 4. Accessed 2021-04-22.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
6
0
1991
9
7
1124
2357
Words
8
Likes
10K
Hits

Cite As

Devopedia. 2022. "Grammar and Spell Checker." Version 15, August 23. Accessed 2023-11-12. https://devopedia.org/grammar-and-spell-checker
Contributed by
2 authors


Last updated on
2022-08-23 13:11:57