Text Normalization

A simple example of text normalization. Source: Geitgey 2020.
A simple example of text normalization. Source: Geitgey 2020.

Text normalization reduces variations in word forms to a common form when the variations mean the same thing. For example, US and U.S.A become USA; Product, product and products become product; naïve becomes naive; $400 becomes 400 dollars; +7 (800) 123 1231 becomes 0078001231231; 25 June 2015 and 25/6/15 become 2015-06-25; and so on.

Before text data is used in training NLP models, it's pre-processed to a suitable form. Text normalization is often an essential step in text pre-processing. Text normalization simplifies the modelling process and can improve the model's performance.

There's no fixed set of tasks that are part of text normalization. Tasks depend on application requirements. Text normalization started with text-to-speech systems and later became important for processing social media text.

Discussion

  • What are the typical tasks within text normalization?

    We can identify the following tasks for normalizing text:

    • Tokenization: Text is normally broken up into tokens. A token is usually a single word but there are exceptions, such as New York.
    • Lemmatization: Reduce surface forms to their root form. For example, sang, sung and sings have a common root 'sing'.
    • Stemming: Strip suffixes. For example, trouble, troubled and troubles are stemmed to 'troubl'. This is a simpler and faster alternative to lemmatization.
    • Sentence Segmentation: Break up text into sentences using characters ., !, or ?.
    • Phonetic Normalization: Words spelled differently could sound the same. Likewise, variations in pronunciation would need to be normalized to the same token.
    • Spelling Correction: In some applications such as IR, it's useful to correct spelling errors. For example, 'infromation' is normalized to 'information'.
    • Non-Standard Words: This includes phone numbers, dates, currencies, addresses, acronyms, etc.
    • Others: Normalization may involve accents (naïve, naive), UK/US spelling (catalogue, catalog), and capital letters (Product, product).
  • What are some NLP applications that benefit from text normalization?

    Information Retrieval (IR) is a typical example. If the search query is 'U.S.A.', we may want to return results for 'U.S.A.' and 'USA'. One way to do this is via query expansion in which both forms are searched. A more efficient approach is to normalize to 'USA', store all documents with this normalized form and search only for 'USA'. Wrong normalization can produce irrelevant results, such 'C.A.T.' normalized to 'cat'.

    Conversational AI involves both Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) synthesis. For example, when a user says "five p m", ASR should interpret this as "5:00PM". This is called inverse text normalization. On the reverse, a text input "6:30PM" should be spoken as "six thirty p m". This is text normalization. Another example is 'Dr.', which could be interpreted as 'Drive' or 'Doctor'. Hence, context of usage is important to determine the correct normal form.

    Machine translation, opinion mining, spell checking, sentiment analysis, dependency parsing, and named entity recognition are further examples of NLP tasks or applications that can benefit from text normalization.

  • What are some general approaches to text normalization?

    Text normalization has a few different approaches:

    • Substitution Lists: Also called wordlist mapping, lookups or memorization. Uses a precompiled list. Doesn't generalize to variants not in the list.
    • Rule-based Methods: Manually crafted rules encode regularities in variants.
    • Distance-based Methods: Edit distance measures such as Levenshtein distance are used to determine if two word forms are similar.
    • Spelling Correction: Hidden Markov Models analyse word morphology and determine the correct spelling. However, corrections are done word by word without any context.
    • Automatic Speech Recognition (ASR): Based on the insight that microtext (social media text, SMS messages) is closer to sound forms rather than proper spelling. Decodes word sequences within a weighted phonetic framework.
    • Machine Translation (MT): Microtext is treated as a foreign language that needs to be translated. This approach captures context. Character-level Statistical Machine Translation (CSMT) maps character sequences rather than words. It's an example of the noisy channel model: a translation model followed by a language model.
    • Neural Models: Use of neural networks such as encoder-decoder model with LSTM.
  • What are some approaches to text tokenization?

    In English, whitespace is used to separate words. Hence, whitespace is often used to identify tokens. Some punctuation characters could also indicate word boundaries. In social media text, :) and #nlproc would be considered as tokens.

    Contractions are often normalized to expanded forms. Examples, what're → what are, I'm → I am, isn't → is not. This sort of normalization results in two tokens from a single word. On the contrary, New York is an example of two words considered as a single token.

    Tokenization of some words is far from unambiguous. Hyphens present a challenge. Should state-of-the-art become 'state of the art'? Should Hewlett-Packard become 'Hewlett Packard'? Should lower-case become 'lowercase' or 'lower case'? Some acronyms are also challenging. How should we tokenize m.p.h. and PhD?

    In Japanese and Chinese, there are no spaces to separate words. A greedy algorithm that attempts to find the longest dictionary word is often used. In French, should L'ensemble be tokenized as L, L' or Le? In German, noun compounds are not segmented and their processing is deferred to the application.

    Among the well-known tokenization approaches are Byte-Pair Encoding (BPE), WordPiece and SentencePiece.

  • What are non-standard words that need to be normalized?
    A taxonomy of NSWs useful for hand tagging and modelling. Source: Sproat et al. 2001, table 1.
    A taxonomy of NSWs useful for hand tagging and modelling. Source: Sproat et al. 2001, table 1.

    Non-Standard Words (NSWs) include numbers, abbreviations, dates, currency amounts and acronyms. Mixed-case words (WinNT, SunOS), Roman numerals, URLs, and email addresses are more categories of NSWs.

    NSWs often occur in text apart from ordinary words and names. The challenge with NSWs is that they're not dictionary words and their interpretation tends to be ambiguous. Therefore, we need to normalize them. This basically means replacing them with ordinary words.

    Take for example 'Pvt', which is interpreted as 'Private'. An ambiguous example is 'IV'. It could be read as four, fourth or intravenous, depending on the context. The number 1750 could refer to a year, a building number or a cardinal number. These differences are important for a TTS system that needs to determine the correct pronunciation. Should Amazon Alexa read '2/3' as 'two thirds' or 'February Third'?

    Rather than employ ad hoc techniques to handle NSWs, formal modelling has been shown to give better results. Techniques could include n-gram models, decision trees, and weighted finite-state transducers.

  • What are the challenges in normalizing social media text?
    Possible edits to normalize social media text. Source: Baldwin and Li 2015, fig. 1.
    Possible edits to normalize social media text. Source: Baldwin and Li 2015, fig. 1.

    Social media text often don't conform to rules of spelling, grammar or punctuation. Among its challenges are:

    • Abbreviations: nite (night), gr8 (great), sayin (saying), lol (laugh out loud), iirc (if I remember correctly), hard2tell (hard to tell)
    • Misspelling: wouls (would), rediculous (ridiculous)
    • Omitted Punctuation: im (I'm), dont (don't)
    • Slang: that was well mint (that was well good)
    • Wordplay: that was soooooo great (that was so great)
    • Disguised Vulgarities: sh1t, f**k
    • Emoticons: :) for smiling face, <3 for heart
    • Informal Transliteration: This concerns only multilingual text. Variations in transliteration occur due to long vowels, borrowed words, accents/dialects, double consonants, etc.

    Experiments have shown that normalizing these gives better performance in machine translation and spell checking. However, challenges remain. Emoticons :P and ;D are treated as spelling errors. Abbreviations 'b' for 'be' and 'c' for 'see' are not caught by spell checkers and later affect machine translation. When "I'm" is written as "im", it's misinterpreted as an abbreviation for instant messaging.

  • What does it mean to normalize Unicode strings?
    Examples of Unicode normalization forms. Source: Whistler 2020, fig. 6.
    Examples of Unicode normalization forms. Source: Whistler 2020, fig. 6.

    Consider the angstrom symbol Å that may require normalization. Its Unicode codepoint is 212B. It can be decomposed into A followed by a small top circle.

    Unicode characters can contain diacritical marks, ligatures, or half-width katakana characters. Unicode has defined four normalization forms:

    • Normalization Form D (NFD): Canonical Decomposition
    • Normalization Form C (NFC): Canonical Decomposition, followed by Canonical Composition
    • Normalization Form KD (NFKD): Compatibility Decomposition
    • Normalization Form KC (NFKC): Compatibility Decomposition, followed by Canonical Composition

    Canonical equivalence means that equivalent characters or sequences of characters represent the same abstract character. They display and behave the same way.

    Compatibility equivalence is a weaker type of equivalence. In this case, the visual appearance and behaviour may differ though they represent the same abstract character. For example, character ℌ becomes H and ¼ becomes 1/4. This difference may be acceptable in some applications. In some cases, applications may account for these differences with additional styling.

    Consider 'schön'. Its normal forms are 'scho\u0308n' (NFD & NFKD) and 'schön' (NFC & NFKC). Moreover, NFC and NFKC differ only in the decomposition phase.

  • What are some neural network approaches to text normalization?
    Text normalization with encoder-decoder model using GRUs and attention mechanism. Source: Zhang et al. 2019, fig. 6.
    Text normalization with encoder-decoder model using GRUs and attention mechanism. Source: Zhang et al. 2019, fig. 6.

    Since 2016, Recurrent Neural Networks (RNNs) have been used for text normalization. In particular, a few layers of BiLSTM have been used to map character sequences to word tokens.

    For a long time CSMT was the state of the art in text normalization. Neural models generally need much larger training datasets. To overcome this limitation, Lusetti et al. (2018) trained a character-level encoder-decoder model plus a word-level language model. Beam search is used during decoding.

    Zhang et al. (2019) used transformers with good results but it's prone to unrecoverable errors. They got better results by modifying encoder-decoder model to capture context more effectively. Their multi-task architecture jointly trains the tagger and the normalizer.

    Memory augmented network has been applied. A hybrid word-character attention-based encoder-decoder model has been used, with character-based component trained on adversarial examples. Pointer-generator network with transformer encoder and auto-regressive decoder has been used, with the pointer module replacing OOV output tokens.

    For many NLP tasks in Chinese, word tokenization is not required. However, Convolutional Neural Networks (CNNs) have been used.

  • Could you mention some useful developer tools for text normalization?

    In Python, many NLP software libraries support text normalization, particularly tokenization, stemming and lemmatization. Some of these include NLTK, Hunspell, Gensim, SpaCy, TextBlob and Pattern. More tools are listed in an online spreadsheet.

    Penn Treebank tokenization standard is applied to treebanks released by the Linguistic Data Consortium (LDC). This standard keeps hyphenated words together, breaks up contractions (doesn't → does and n't), and separates out all punctuation ($10 → $ and 10).

    For Unicode normalization, the International Components for Unicode page links to many useful resources including open source software. There's also an online demo at Unicode.org and a Unicode normalization FAQ.

    In R, utf8_normalize from utf8 package does Unicode normalization. For other text analysis, R packages tidytext, tm, SnowballC and topicmodels are useful.

    Wolfram supports many levels of text normalization: character-level, word-level, sentence-level, morphological and linguistic.

Milestones

1987

An early example of text normalization in the context of Text-to-Speech (TTS) is in a system named MITalk. Normalization is achieved using hard-coded rules in either Fortran or C.

1996

In the Bell Labs multilingual TTS system, Weighted Finite State Transducer (WFST) is used for text normalization. Instead of doing this as a pre-processing step, normalization is done along with other linguistic tasks. To consider context, language model transducers are used. The method identifies many possible interpretations and selects the best path using Viterbi algorithm. As late as 2014, this approach continues to be used in practice, such as in Google's Kestrel system.

2001
Architecture of a text normalization system. Source: Sproat et al. 2001, fig. 2.
Architecture of a text normalization system. Source: Sproat et al. 2001, fig. 2.

Sproat et al. give a taxonomy of NSWs. They also treat text normalization as a language modelling problem. For TTS application, they present both supervised and unsupervised machine learning approaches, with the latter being a better choice for new domains.

2005

With the growth of social media, there's a need to normalize such text. From about mid-2000s, this drives interest in text normalization for social media text.

Jul
2006

Aw et al. propose the metaphor of Machine Translation (MT) for normalizing SMS messages. The idea is to "translate" SMS language to English language by adapting a phrase-based statistical MT model. For alignment during training, they use EM algorithm and Viterbi search. They show improved BLEU score. They also show that downstream English to Chinese translations improve.

Oct
2007

Choudhury et al. apply Hidden Markov Model (HMM) to the problem of normalizing SMS messages. Non-standard tokens are the emission states. They also adopt the spell checking metaphor and process text at character level rather than word level.

Nov
2011

Pennell and Liu introduce a character-level MT method. Examples of character-level mappings are 'a'→'er', '@'→'at', and '8'→'ate'. This is only the first phase where possible expansions are identified. In the second phase, a language model is used to choose the correct expansion in context.

Dec
2012
Alignment of 'ystrdy' and 'yesterday' using (a) Character-level MT (b) and Character-block level MT. Source: Li and Liu 2012, fig. 1.
Alignment of 'ystrdy' and 'yesterday' using (a) Character-level MT (b) and Character-block level MT. Source: Li and Liu 2012, fig. 1.

Li and Liu propose an algorithm in which input is blocks of characters segmented by phonetic similarity. They use two-step MT, translating non-standard words to phones, then phones to words. They use spell checking for simple corrections. In the example, character-level MT misaligns the second 'e' but character-block level MT gets it right.

2013

Previous work often treated text normalization as replacing out-of-vocabulary or non-standard words with dictionary words. Researchers realize that text normalization can't be a "one-size-fits-all" approach. Downstream NLP task or application matters. Zhang et al. normalize with a view on improving performance of dependency parsing rather than simply evaluate based on word error rate and BLEU score. Wang and Ng normalize social media text for better machine translation. Along with word replacement, they recover missing words and correct punctuation.

2015

Baldwin and Li normalize social media text. They evaluate the effect of normalization on three downstream applications: dependency parsing, NER and TTS. They also study the effect of each normalization edit on each of these applications. For example, only word replacements are critical for NER. For parsing, word replacements, token addition and removal edits are important. For TTS, it's critical to remove non-standard tokens while word addition is important but less so.

Oct
2016

Sproat and Jaitly present neural models for text normalization. In particular, they use a few layers of BiLSTM. In one architecture, they train a BiLSTM channel model to map characters to word tokens, followed by another LSTM for language modelling. In another architecture, they use 4-layer attention-based BiLSTM sequence-to-sequence model. This performs better than the first one. An FST-based filter improves results further.

Aug
2017

Van Esch and Sproat present a revised taxonomy of NSWs. They note that an earlier taxonomy from 2001 is inadequate due to many new categories that have come about due to social media. They present as many as 12 tables of various semiotic classes with useful examples for each. Some of these are word-like tokens, basic numbers, identifiers, dates, times, percentages, measures, geographic entities, and formulae.

Jun
2019
Historical variations of the word 'their'. Source: Bollmann 2019, fig. 1.
Historical variations of the word 'their'. Source: Bollmann 2019, fig. 1.

Historical texts need to be normalized. Bollmann evaluates and analyses the performance of three systems that do this: Norma (rule-based, distance-based, supervised), cSMTiser (CSMT with additional language modelling data), and Neural Machine Translattion (NMT). He considers texts from many languages, some dating back to 14th century. cSMTiser outperforms NMT in most cases. Norma could be used if there's limited training data.

Jun
2019

It's important to normalize NSWs correctly in spoken dialogue systems such as Amazon Alexa. Mansfield et al. approach this as a machine translation problem and sequence-to-sequence modelling. For better context, they use attention mechanism on subword units rather than words. With subwords, we reduce input size and handle OOV words better. BPE is used to create a subword inventory and SentencePiece to find its optimal size. They improve performance further by using linguistic features: POS, position, capitalization, and edit labels.

References

  1. Aw, AiTi, Min Zhang, Juan Xiao, and Jian Su. 2006. "A Phrase-Based Statistical Model for SMS Text Normalization." Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, ACL, pp. 33-40, July. Accessed 2020-12-21.
  2. Baldwin, Tyler, and Yunyao Li. 2015. "An In-depth Analysis of the Effect of Text Normalization in Social Media." Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 420–429, May-June. Accessed 2020-12-19.
  3. Bollmann, Marcel. 2019. "A Large-Scale Comparison of Historical Text Normalization Systems." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3885-3898, June. Accessed 2020-12-19.
  4. Cartwheel Technologies. 2017. "Natural Language Processing - Text Normalization." Cartwheel Technologies, December 17. Accessed 2020-12-19.
  5. Choudhury, Monojit, Rahul Saraf, Vijit Jain, Sudeshna Sarkar, and Anupam Basu. 2007. "Investigation and Modeling of the Structure of Texting Language." International Journal of Document Analysis and Recognition (IJDAR), vol. 10, pp. 157-174, October. Accessed 2020-12-21.
  6. Clark, Eleanor, and Kenji Araki. 2011. "Text Normalization in Social Media: Progress, Problems and Applications for a Pre-Processing System of Casual English." Procedia - Social and Behavioral Sciences, Elsevier, vol. 27, pp. 2-11. Accessed 2020-12-19.
  7. Davydova, Olga. 2018. "Text Preprocessing in Python: Steps, Tools, and Examples." Data Monsters, on Medium, October 16. Accessed 2020-12-19.
  8. Ganesan, Kavita. 2019. "All you need to know about text preprocessing for NLP and Machine Learning." KDNuggets, April. Accessed 2020-12-19.
  9. Geitgey, Adam. 2020. "Build Your Own ‘Google Translate’-Quality Machine Translation System." Medium, May 4. Accessed 2020-12-21.
  10. JetBrains Academy. 2020. "Theory: Text normalization." Hyperskill, JetBrains Academy. Accessed 2020-12-19.
  11. Jurafsky, Daniel. 2015. "Basic Text Processing." Stanford University, August 8. Accessed 2020-12-19.
  12. Jurafsky, Daniel and James H. Martin. 2019. "Regular Expressions, Text Normalization, Edit Distance." Chapter 2 in: Speech and Language Processing, Third Edition draft, October 2. Accessed 2020-12-19.
  13. Li, Chen, and Yang Liu. 2012. "Improving Text Normalization using Character-Blocks Based Models and System Combination." Proceedings of COLING 2012, pp. 1587-1602, December. Accessed 2020-12-19.
  14. Lourentzou, Ismini, Kabir Manghnani, and ChengXiang Zhai. 2019. "Adapting Sequence to Sequence models for Text Normalization in Social Media." arXiv, v1, April 12. Accessed 2020-12-19.
  15. Lusetti, Massimo, Tatyana Ruzsics, Anne Göhring, Tanja Samardžić, and Elisabeth Stark. 2018. "Encoder-Decoder Methods for Text Normalization." Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), ACL, pp. 18-28, August. Accessed 2020-12-19.
  16. ML Wiki. 2015. "Text Normalization." ML Wiki, June 27. Accessed 2020-12-19.
  17. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. 2008. "Normalization (equivalence classing of terms)." Chapter 2 in: Introduction to Information Retrieval, Cambridge Univ. Press. Accessed 2020-12-19.
  18. Mansfield, Courtney, Ming Sun, Yuzong Liu, Ankur Gandhe, and Björn Hoffmeister. 2019. "Neural Text Normalization with Subword Units." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pp. 190-196, June. Accessed 2020-12-19.
  19. Nguyen, Hoang, and Sandro Cavallari. 2020. "Neural Multi-task Text Normalization and Sanitization with Pointer-Generator." Proceedings of the First Workshop on Natural Language Interfaces, ACL, pp. 37-47, July. Accessed 2020-12-19.
  20. Oracle Docs. 2020. "Normalizing Text." The Java™ Tutorials, Oracle. Accessed 2020-12-19.
  21. Pennell, Deana, and Yang Liu. 2011. "A Character-Level Machine Translation Approach for Normalization of SMS Abbreviations." Proceedings of 5th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, pp. 974-982, November. Accessed 2020-12-21.
  22. Pramanik, Subhojeet, and Aman Hussain. 2019. "Text normalization using memory augmented neural networks." arXiv, v3, April 3. Accessed 2020-12-19.
  23. R-Project. 2018. "utf8_normalize." R Documentation, package utf8, v1.1.4, Unicode Text Processing, May 28. Accessed 2020-12-19.
  24. Satapathy, Ranjan. 2018. "Text Normalization in Natural Language Processing (NLP): An Introduction [Part 1]." Lingvo Masino, on Medium, January 9. Accessed 2020-12-19.
  25. Singh, Rajat, Nurendra Choudhary, and Manish Shrivastava. 2018. "Automatic Normalization of Word Variations in Code-Mixed Social Media Text." arXiv, v1, April 3. Accessed 2020-12-19.
  26. Sproat, Richard. 1996. "Multilingual text analysis for text-to-speech synthesis." Proc. ECAI Workshop, Extended Finite State Models of Language. Accessed 2020-12-21.
  27. Sproat, Richard, and Navdeep Jaitly. 2017. "RNN Approaches to Text Normalization: A Challenge." arXiv, v2, January 24. Accessed 2020-12-19.
  28. Sproat, Richard, Alan W. Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, and Christopher Richards. 2001. "Normalization of non-standard words." Computer Speech and Language, Academic Press, vol. 15, pp. 287-333. Accessed 2020-12-19.
  29. Sun, Ming. 2019. "Should Alexa Read “2/3” as “Two-Thirds” or “February Third”?: The Science of Text Normalization." Blog, Amazon Science, May 16. Accessed 2020-12-19.
  30. Unicode. 2019. "Normalization." FAQ, Unicode, September 13. Accessed 2020-12-19.
  31. Van Esch, Daan, and Richard Sproat. 2017. "An Expanded Taxonomy of Semiotic Classes for Text Normalization." INTERSPEECH 2017, ISCA, pp. 4016-4020, August 20-24. Accessed 2020-12-21.
  32. Wang, Pidong and Hwee Tou Ng. 2013. "A beam-search decoder for normalization of social media text with application to machine translation." Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 471-481, June. Accessed 2020-12-21.
  33. Whistler, Ken. 2020. "Unicode Normalization Forms." Unicode® Standard Annex #15, Unicode 13.0.0, February 24. Accessed 2020-12-19.
  34. Wolfram. 2020. "Text Normalization." Reference, Wolfram Language & System Documentation Center. Accessed 2020-12-19.
  35. Zhang, Congle, Tyler Baldwin, Howard Ho, Benny Kimelfeld, and Yunyao Li. 2013. "Adaptive parser-centric text normalization."" Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1159-1168, August. Accessed 2020-12-21.
  36. Zhang, Hao, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, and Brian Roark. 2019. "Neural Models of Text Normalization for Speech Applications." Computational Linguistics, vol. 45, no. 2, June. doi: 10.1162/coli_a_00349. Accessed 2020-12-19.

Further Reading

  1. Veliz, Claudia Matos, Orphee De Clercq, and Veronique Hoste. 2019. "Comparing MT Approaches for Text Normalization." Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 740-749, September. Accessed 2020-12-19.
  2. Sproat, Richard, Alan W. Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, and Christopher Richards. 2001. "Normalization of non-standard words." Computer Speech and Language, Academic Press, vol. 15, pp. 287-333. Accessed 2020-12-19.
  3. Zhang, Hao, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, and Brian Roark. 2019. "Neural Models of Text Normalization for Speech Applications." Computational Linguistics, vol. 45, no. 2, June. doi: 10.1162/coli_a_00349. Accessed 2020-12-19.
  4. Sproat, Richard, and Navdeep Jaitly. 2017. "RNN Approaches to Text Normalization: A Challenge." arXiv, v2, January 24. Accessed 2020-12-19.
  5. Baldwin, Tyler, and Yunyao Li. 2015. "An In-depth Analysis of the Effect of Text Normalization in Social Media." Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 420–429, May-June. Accessed 2020-12-19.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
2
0
2514
2615
Words
2
Likes
17K
Hits

Cite As

Devopedia. 2020. "Text Normalization." Version 2, December 21. Accessed 2023-11-12. https://devopedia.org/text-normalization
Contributed by
1 author


Last updated on
2020-12-21 13:42:38