Large Language Model

A selection of LLMs on a timeline. Source: Zhao et al. 2023, fig. 3.
A selection of LLMs on a timeline. Source: Zhao et al. 2023, fig. 3.

By seeing lots of text, a language model learns the probability of a sequence of words. A Large Language Model (LLM) also learns certain nuances of the language itself. Without being explicitly taught the rules of grammar, it encodes in its model parameters the syntax of the language and word semantics.

Given an input prompt, an LLM predicts the next most probable word. Hence, LLMs are generative in nature. LLMs come under the more general discipline of Generative AI.

There's no definite answer to what makes an LLM large. Any model trained on billions of words and learns a few billion parameters is perhaps an LLM. Above a certain threshold size, LLMs are seen to exhibit emergent behaviour.

Most users will use pre-trained LMs, perhaps fine-tune them for specific use cases and invoke them via apps.


  • How are LLMs trained and deployed?
    LLM pre-training, fine-tuning and prompting. Source: Wolfe 2024.
    LLM pre-training, fine-tuning and prompting. Source: Wolfe 2024.

    An LLM is trained on lots of unlabelled data. This is self-supervised learning: the model automatically learns latent patterns and relationships. Training data comes from a variety of sources including webpages, books, discussion forums, technical journals, code samples, product documentation, etc. The end result is a Pre-Trained Language Model (PLM).

    It's possible to deploy a PLM for making inferences. However, a PLM is what we call a Foundation Model (FM). It's not trained for any specific task, such as language translation, code generation or text summarization. For better results, a PLM becomes the foundation on which it's fine-tuned for a specific task. Task-specific or domain-specific data (which may also include labels) is used for fine-tuning. This is much smaller than the pre-training dataset. The end result is a Fine-Tuned LLM.

    LLMs are typically deployed in the cloud. Users interact with apps. Apps call the LLM APIs. Apps may use agents to mediate interactions between users and the LLM. Agents have personas and memory. They've access to tools and external knowledge sources. Agents may enhance a user query before it's fed into the LLM as a prompt.

  • What's the internal architecture of an LLM?
    Architecture of GPT LLM. Source: Lee 2023, fig. 1.
    Architecture of GPT LLM. Source: Lee 2023, fig. 1.

    LLMs are based on the transformer architecture that was invented in 2017. Transformers are a specific type of Artificial Neural Networks (ANNs). An LLM typically has many attention layers. Each layer consists of a multi-head attention block plus a feedforward neural network (FFNN). Output of one layer feeds into the next. Finally, an FFNN outputs the next token. Attention itself is an approach to learn and quantify how a word is related to other words surrounding it.

    The original transformer used an encoder-decoder architecture because that research focused on machine translation. For example, to translate a sentence from English to French, the encoder would encode the English sentence and the decoder would predict the French words one at a time. BART (2020) and T5 (2022) used the encoder-decoder architecture. BERT (2018) was an encoder-only transformer. Most modern transformers including GPT-4, Claude 3 and Llama 3 are decoder-only. Decoder-only transformers are autoregressive, that is, each word is generated based on preceding words.

    Recent research has brought a few variations of transformers. In addition, there's attempt to reformulate attention as RNNs.

  • What are the building blocks of LLM?
    Illustrating GPT-4's tokenization. Source: OpenAI 2024.
    Illustrating GPT-4's tokenization. Source: OpenAI 2024.

    Computers don't understand words the way humans do. Words are therefore represented as numbers. Transformers use a sequence of numbers called a vector. The number of items in a vector is called its dimension.

    In reality, a word is decomposed into one or more of a more basic unit called a token. LLMs internally deal with tokens. There are many tokenizers: Byte Pair Encoding (BPE), SentencePiece, Unigram, WordPiece, etc. The output of a tokenizer is a vocabulary of tokens called encoding.

    Embeddings are learned representations of tokens. Learning happens via language modelling tasks such as predicting the next token or masked tokens. Embeddings therefore capture the context in which tokens occur in the language. Mathematically, embeddings are tokens represented as vectors. Tokens that are similar or related (king and queen, coffee and tea) are likely to be close to one another in the vector space.

    ChatGPT uses Byte-level BPE (BBPE) with 100k-token vocabulary at about 100 tokens per 75 words. An example embedding from OpenAPI is text-embedding-3-large of 3072-dimensional vectors. Assuming 32-bit floats, this has a memory requirement of \(3072\cdot4\cdot100000\) = \(1.23GB\).

  • What techniques are being used to improve LLMs?

    Fine-tuning takes a foundation model and trains it for specific tasks. With this approach, many fine-tuned models can be obtained from a common foundation model. Fine-tuning is a lot less expensive than pre-training. It also requires far less training data. Some methods fine-tune the entire model. Others add extra parameters and only these are fine-tuned.

    It's possible to elicit better responses from PLMs just by customizing the prompts. Called Prompt Engineering or In-Context Learning (ICL), user queries are enhanced with templates and a few illustrative examples of the task at hand.

    Retrieval Augmented Generation (RAG) is a technique in which LLMs are given additional context along with the prompt. Given a user query, relevant context is retrieved from a knowledge base. This knowledge base contains private, up-to-date or domain-specific data. Context helps LLMs to generate more accurate responses.

    LLMs are huge but their capacity is often underutilized. This means that similar performance can be obtained by a smaller model. Quantization, knowledge distillation and pruning are techniques for model compression. Using high-quality high-volume pre-training data, it's also possible to train a smaller model with only a small loss of accuracy.

  • What are some applications of LLM?
    LLMs fine-tuned for various tasks. Source: Merritt 2023.
    LLMs fine-tuned for various tasks. Source: Merritt 2023.

    Considering text-only applications, LLMs are being used for information extraction, text summarization, question answering, commonsense reasoning, sentiment analysis, content generation, code generation, language translation, and more. For example, companies can provide customer support via chatbots that have access to product documents, user manuals and warranty information. E-commerce websites can provide an auto-generated product review based on user-written reviews. Job-seekers can use LLMs to write resumes customized for each job description. LLMs can create case summaries for legal teams or extract the sentiment from financial reports.

    Multimodal applications span not just text but also speech, audio, image and video content. Applications include image captioning, object recognition, image generation, image enhancement, speech transcription, speech recognition, video generation, video question answering, video segmentation, and more. Video search and retrieval is possible. An audio podcast can be created from a technical publication. In radiology, LLMs can process text, handwritten notes and MRI scans for diagnosis. LLMs can present financial information in the form of charts.

  • What are some examples of LLMs?

    Among the FMs are Claude 3 (Anthropic), Gemini (Google), GPT-4 (OpenAI), Jurassic-2 (AI21 Labs), PaLM 2 (Google), Stable LM (Stability AI), and Titan Text G1 (Amazon). Among the open-source FMs are Command R (Cohere), Falcon 180B (TTI), Jamba (AI21 Labs), Llama 3 (Meta), Mixtral 8x22B (Mistral AI), and T5 (Google). Open source means that architecture, weights and in some cases even pre-training datasets are published.

    InstructGPT is fine-tuned from GPT-3 to follow instructions and align with human values. It can be used for technical documentation, customer support, translation, etc. Similar instruct LLMs fine-tuned from their corresponding base models include Alpaca-7B, Dolly-7B, Falcon-7B-Instruct and Mistral-7B-Instruct. WizardMath is fine-tuned from Llama-2 to solve math problems. Code Llama is fine-tuned from Llama-2 for code generation in many popular programming languages. Likewise, Codex is fine-tuned from GPT-3. Flan-T5 is fine-tuned from T5 on many tasks.

    Some models are specialized during pre-training itself and fine-tuned further if necessary. Domain-specific PLMs include BloombergGPT (financial) and BioBERT (biomedical). CroissantLLM is pre-trained on English and French tokens. It can then be fine-tuned for chat, translation and summarization tasks. PolyCoder is pre-trained on code in a dozen programming languages.

  • What are some challenges with LLMs?

    Called hallucination, LLMs can give seemingly convincing answers that are wrong. Answers could be self-contradictory, nonsensical, or ungrounded with respect to the input context. RAG and RLHF mitigate this problem.

    From data, LLMs also learn some bad stuff: bias, hate speech, self-harm, jailbreaks, etc. Their responses may leak private or copyrighted information. It's therefore wise to filter responses before delivering them to users. Bad actors could use LLMs to create misinformation, adversarial attacks and malware.

    LLMs are pre-trained on billions or even trillions of tokens. At such volumes, ensuring high-quality data is a challenge. Data contamination happens when training data finds its way into test datasets. Data pollution happens when LLM-generated data (with hallucination and misinformation) gets used for training the next generation of models. Training LLMs with synthetic training data can lead to model collapse.

    Evaluation metrics and benchmarks are far from ideal. These may fail to evaluate LLMs at a qualitative level or in a domain-specific manner. In production, evaluation needs to be done continously since models tend to drift.



Interest towards language modelling is motivated by speech recognition. Statistics is used, with n-gram modelling being a popular approach. Language models in this era are therefore named Statistical Language Models (SLMs).


Bengio et al. propose a language model based on a feedforward neural network. They learn a distributed representation of words (which would later be called embeddings) and a joint probability function of word sequences. Thus is born the Neural Language Model (NLM). In later years, RNN and LSTM architectures are used for NLM.

Encoder-decoder model with attention. Source: Weng 2018, fig. 4.
Encoder-decoder model with attention. Source: Weng 2018, fig. 4.

For machine translation, Bahdanau et al. introduce the concept of attention to a seq2seq model. The decoder "pays attention" to important parts of the source sentence. This allows the decoder to do soft alignment with the encoder. Thus, the encoder isn't forced to compress all relevant information into a single vector.

Example of self-attention within a word sequence. Source: Weng 2018.
Example of self-attention within a word sequence. Source: Weng 2018.

Vaswani et al. propose the transformer model based on the concept of self-attention where words attend to other words in the sequence. They use an encoder-decoder architecture. Essential concepts used in the research include input/output embeddings, positional embeddings, and multi-head attention. Unlike RNNs, transformers can be parallelized.

BERT is bidirectional while GPT is not. Source: Devlin et al. 2019, fig. 3.
BERT is bidirectional while GPT is not. Source: Devlin et al. 2019, fig. 3.

The idea of a Pre-Trained Language Model (PLM) that can be later be fine-tuned for specific tasks in born. Two models released as PLMs include GPT (June) and BERT (October). Both use transformers but GPT is autoregressive and decoder-only whereas BERT is bidirectional and encoder-only. Due to these PLMs, some think that "NLP's ImageNet moment has arrived" and 2018 is NLP's "watershed moment".

Illustrating PEFT (2b) and comparing it with full finetuning (2a). Source: Raschka 2023.
Illustrating PEFT (2b) and comparing it with full finetuning (2a). Source: Raschka 2023.

Rather than fine-tune all parameters of an LLM, it's more efficient to fine-tune only a few parameters. This approach is called Parameter-Efficient Fine-Tuning (PEFT). In time, this approach becomes popular and many variants of PEFT are proposed: LoRA (2021), , (IA)3 (2022), and QLoRA (2023).

In-Context Learning with a few examples of the task. Source: Bashir 2023.
In-Context Learning with a few examples of the task. Source: Bashir 2023.

New research suggests that foundation models can be applied with better results even without fine-tuning. One approach called Retrieval-Augmented Generation supplements LLMs with external knowledge sources. The sources are searched and relevant information is added to the query before prompting the LLM. In another approach called In-Context Learning (ICL) the prompt includes a few examples of the task at hand.

InstructGPT outperforms GPT-3 at different model sizes. Source: OpenAI 2022a.
InstructGPT outperforms GPT-3 at different model sizes. Source: OpenAI 2022a.

In January, OpenAI announces InstructGPT, a model that's fine-tuned from GPT-3. Unlike GPT-3 that can sometimes give wrong or unhelpful answers, InstructGPT is aligned to user needs. It can follow instructions in the prompts. Supervised training was done with a technique called Reinforcement Learning from Human Feedback (RLHF). In November, OpenAI releases to the public a similar conversational model called ChatGPT fine-tuned from GPT-3.5. ChatGPT becomes so popular that it reaches a 100M user base within 2 months.

LLMs by size, volume of training data and release date. Source: Bhayana 2024, fig. 1.
LLMs by size, volume of training data and release date. Source: Bhayana 2024, fig. 1.

Google releases PaLM-2-340B pre-trained on 3.6T tokens. Their larger model PaLM-540B (April 2022) was trained on only 780B tokens. Thus, we see research interest in smaller models pre-trained on relatively larger datasets. This is motivated by the Chinchilla Scaling Law, first noted in 2022. Other small models released in 2023 include Llama-2-7B, Mistral-7B, Orca-2-7B, and Phi-2-2.7B. Compare these with OpenAI's GPT-4 (March 2023) of 1.8T parameters trained on 13T tokens.


Google's Gemini 1.5 Pro claims a context window of 1M tokens. This is an advancement compared to 100K in Claude 1.3 (Mar 2023) and 128K in GPT-4 Turbo (Nov 2023). A context window is essentially the number of tokens (input and output) over which attention is computed. A larger context window can capture long-distance relationships.


  1. AWS. 2024a. "Supported large language models for fine-tuning." Documentation, Amazon SageMaker, AWS. Accessed 2024-05-23.
  2. AWS. 2024b. "Supported foundation models in Amazon Bedrock." Documentation, AWS Bedrock, AWS. Accessed 2024-05-25.
  3. Aghajanyan, A., L. Zettlemoyer, and S. Gupta. 2020. "Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning." v1, arXiv, December 22. Accessed 2024-05-21.
  4. Alammar, Jay. 2019. "The Illustrated GPT-2 (Visualizing Transformer Language Models)." August 12. Accessed 2024-05-25.
  5. Ali, M., Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, and Nicolas Flores-Herr. 2024. "Tokenizer Choice For LLM Training: Negligible or Crucial?" v4, arXiv, March 17. Accessed 2024-05-21.
  6. Awadallah, A., Andres Codas, Luciano Del Corro, Hamed Khanpour, Shweti Mahajan, Arindam Mitra, Hamid Palangi, Corby Rosset, Clarisse Simoes Ribeiro, and Guoqing Zheng. 2023. "Orca 2: Teaching Small Language Models How to Reason." Microsoft Research Blog, November 20. Accessed 2024-05-25.
  7. Bahdanau, D., K. Cho, and Y. Bengio. 2016. "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv, v7, May 19. Accessed 2024-05-24.
  8. Bashir, D. 2023. "In-Context Learning, In Context." The Gradient, April 29. Accessed 2024-05-25.
  9. Bengio, Y., R. Ducharme, P. Vincent, and C. Jauvin. 2003. "A Neural Probabilistic Language Model." Journal of Machine Learning Research, vol. 3, pp. 1137–1155. Accessed 2024-05-24.
  10. Bhayana, R. 2024. "Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications." Radiology, RSNA, vol. 310, no. 1, January 16. Accessed 2024-05-25.
  11. Brown, T.B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. "Language Models are Few-Shot Learners." v4, arXiv, July 22.
  12. Chaudhary, A. 2024. "Harnessing the Power of Multimodal LLMs for Competitive Business Advantage." Turing, January 19. Accessed 2024-05-24.
  13. Choi, N. 2023. "The architecture of today’s LLM applications." Blog, GitHub, October 30. Accessed 2024-05-22.
  14. Dettmers, T., Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. "QLoRA: Efficient Finetuning of Quantized LLMs." v1, arXiv, May 23. Accessed 2024-05-21.
  15. Devlin, J., Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." v2, arXiv, May 24. Accessed 2024-05-25.
  16. Dong, Y., Xue Jiang, Huanyu Liu, Zhi Jin, and Ge Li. 2024. "Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models." v2, arXiv, May 16. Accessed 2024-05-25.
  17. Elias, J. 2023. "Google’s newest A.I. model uses nearly five times more text data for training than its predecessor." CNBC, May 16. Updated 2023-05-17. Accessed 2024-05-25.
  18. Faysse, M., Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F.T. Martins, Gautier Viaud, Céline Hudelot, and Pierre Colombo. 2024. "CroissantLLM: A Truly Bilingual French-English Language Model." v4, arXiv, March 29. Accessed 2024-05-23.
  19. Feng, B. 2024. "Decoding the Wizardry — Part 2: BBPE Opens Doors for GPTs." On Medium, May 18. Accessed 2024-05-25.
  20. Feng, L., Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, and Greg Mori. 2024. "Attention as an RNN." v1, arXiv, May 22. Accessed 2024-05-25.
  21. Gao, Y., Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. 2024. "Retrieval-Augmented Generation for Large Language Models: A Survey." v5, arXiv, March 27. Accessed 2024-05-21.
  22. Gerstgrasser, M., Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, and Sanmi Koyejo. 2024. "Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data." v2, arXiv, April 29. Accessed 2024-05-25.
  23. Gloeckle, F., Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, and Gabriel Synnaeve. 2024. "Better & Faster Large Language Models via Multi-token Prediction." v1, arXiv, April 30. Accessed 2024-05-21.
  24. Google. 2024. "Illuminate." Experiment, Google. Accessed 2024-05-25.
  25. Greyling, C. 2023. "LLM Drift, Prompt Drift, Chaining & Cascading." On Medium, September 19. Accessed 2024-05-25.
  26. Hendrycks, D. 2024. "Scaling Laws." Section 2.4 in Introduction to AI Safety, Ethics and Society, Center for AI Safety. Accessed 2024-05-24.
  27. Hoffmann, J., Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre. 2022. "Training Compute-Optimal Large Language Models." v1, arXiv, March 29. Accessed 2024-05-24.
  28. Houlsby, N., Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. "Parameter-Efficient Transfer Learning for NLP." v2, arXiv, June 13. Accessed 2024-05-25.
  29. Hu, E.J, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. "LoRA: Low-Rank Adaptation of Large Language Models." v2, arXiv, October 16. Accessed 2024-05-21.
  30. IBM. 2023. "What are LLMs?" IBM, November 2. Accessed 2024-05-25.
  31. Janakiram MSV. 2024. "The Building Blocks of LLMs: Vectors, Tokens and Embeddings." The New Stack, February 8. Accessed 2024-05-25.
  32. Javaheripi, M. and S. Bubeck. 2023. "Phi-2: The surprising power of small language models." Microsoft Research Blog, December 12. Accessed 2024-05-25.
  33. Kumari, P. 2023. "Unveiling InstructGPT: A Powerful Language Model by OpenAI." Blog, Labellerr, November 16. Accessed 2024-05-23.
  34. LLM360. 2023. "Introducing LLM360: Fully Transparent Open-Source LLMs." Blog, LLM360, December 11. Accessed 2024-05-25.
  35. Lee, M. 2023. "A Mathematical Investigation of Hallucination and Creativity in GPT Models." Mathematics, MDPI, vol. 11, no. 10, article no. 2320, May 16. Accessed 2024-05-24.
  36. Lee, J., Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. "BioBERT: a pre-trained biomedical language representation model for biomedical text mining." Bioinformatics, vol. 36, no. 4, pp. 1234-1240, February. Accessed 2024-05-23.
  37. Lei, D., Yaxi Li, Mengya Hu, Mingyu Wang, Vincent Yun, Emily Ching, and Eslam Kamal. 2023. "Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations." v2, arXiv, October 9. Accessed 2024-05-25.
  38. Lewis, P., Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2021. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." v4, arXiv, April 12. Accessed 2024-05-21.
  39. Liu, H., Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin Raffel. 2022. "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning." v2, arXiv, August 26. Accessed 2024-05-21.
  40. Liu, Y., Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, and Bao Ge. 2024a. "Understanding LLMs: A Comprehensive Overview from Training to Inference." v2, arXiv, January 6. Accessed 2024-05-21.
  41. Liu, Z., Aoxiao Zhong, Yiwei Li, Longtao Yang, Chao Ju, Zihao Wu, Chong Ma, Peng Shu, Cheng Chen, Sekeun Kim, Haixing Dai, Lin Zhao, Lichao Sun, Dajiang Zhu, Jun Liu, Wei Liu, Dinggang Shen, Xiang Li, Quanzheng Li, and Tianming Liu. 2024b. "Radiology-GPT: A Large Language Model for Radiology." v2, arXiv, March 19. Accessed 2024-05-24.
  42. Luo, H., Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, and Dongmei Zhang. 2023. "WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct." v1, arXiv, August 18. Accessed 2024-05-23.
  43. Math Insight. 2024. "Examples of n-dimensional vectors." Math Insight. Accessed 2024-05-25.
  44. Mehta, S., Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, and Mohammad Rastegari. 2024. "OpenELM: An Efficient Language Model Family with Open Training and Inference Framework." v2, arXiv, May 2. Accessed 2024-05-25.
  45. Merritt, R. 2023. "What Are Foundation Models?" Blog, NVIDIA, March 13. Accessed 2024-05-23.
  46. Meta. 2023. "Introducing Code Llama, a state-of-the-art large language model for coding." Blog, Meta, August 24. Updated 2024-01-29. Accessed 2024-05-23.
  47. Meta. 2024. "Introducing Meta Llama 3: The most capable openly available LLM to date." Blog, Meta, April 18. Accessed 2024-05-25.
  48. Microsoft. 2024. "How generative AI and LLMs work." .NET, Microsoft, May 21. Accessed 2024-05-25.
  49. Milmo, D. 2023. "ChatGPT reaches 100 million users two months after launch." The Guardian, February 2. Accessed 2024-05-25.
  50. Minaee, S., Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. 2024. "Large Language Models: A Survey." v2, arXiv, February 20. Accessed 2024-05-21.
  51. Monigatti, L. 2023. "Recreating Amazon’s New Generative AI Feature: Product Review Summaries." On Medium, November 21. Accessed 2024-05-25.
  52. Munkhdalai, T., M. Faruqui, and S. Gopal. 2024. "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention." v1, arXiv, April 10. Accessed 2024-05-24.
  53. OpenAI. 2021. "OpenAI Codex." OpenAI, August 10. Accessed 2024-05-23.
  54. OpenAI. 2022a. "Aligning language models to follow instructions." OpenAI, January 27. Accessed 2024-05-25.
  55. OpenAI. 2022b. "Introducing ChatGPT." OpenAI, November 30. Accessed 2024-05-25.
  56. OpenAI. 2023. "GPT-4." OpenAI, March 14. Accessed 2024-05-25.
  57. OpenAI. 2024. "Tokenizer." OpenAI. Accessed 2024-05-24.
  58. OpenAI. 2024a. "How to count tokens with tiktoken." OpenAI Cookbook, OpenAI, January 25. Accessed 2024-05-25.
  59. OpenAI. 2024b. "Embeddings." Documentation, OpenAI. Accessed 2024-05-25.
  60. Ouyang, L., Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. "Training language models to follow instructions with human feedback." v1, arXiv, March 4. Accessed 2024-05-25.
  61. Pan, Y., L. Pan, W. Chen, P. Nakov, M.-Y. Kan, and W.Y. Wang. 2023. "On the Risk of Misinformation Pollution with Large Language Models." Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 1389–1403, December 6-10. Accessed 2024-05-21.
  62. Pichai, S. and D. Hassabis. 2024. "Our next-generation model: Gemini 1.5." Blog, Google, February 15. Accessed 2024-05-24.
  63. Pont, T.D., Federico Galli, Andrea Loreggia, Giuseppe Pisano, Riccardo Rovatti, and Giovanni Sartor. 2023. "Legal Summarisation through LLMs: The PRODIGIT Project." v1, arXiv, August 4. Accessed 2024-05-25.
  64. Press, O. 2017. "Neural Language Models Explained." Blog, September 7. Accessed 2024-05-24.
  65. Ramponi, M. 2023. "The Full Story of Large Language Models and RLHF." Blog, AssenmblyAI, May 3. Accessed 2024-05-23.
  66. Raschka, S. 2023. "Finetuning LLMs Efficiently with Adapters." Ahead of AI, May 20. Accessed 2024-05-25.
  67. Rosenfeld, R. 2000. "Two decades of statistical language modeling: where do we go from here?" Proceedings of the IEEE, vol. 88, no. 8, pp. 1270-1278, August, doi: 10.1109/5.880083. Accessed 2024-05-21.
  68. Ruder, S. 2018. "NLP's ImageNet moment has arrived." July 12. Accessed 2024-05-25.
  69. Ruder, S. 2024. "The Evolving Landscape of LLM Evaluation." NLP News, May 13. Accessed 2024-05-25.
  70. Saini, G. 2023. "How to build a Smart Customer Service Chatbot with LLMs, Vector Library and Streamlit along with the chat history." On Medium, October 23. Accessed 2024-05-25.
  71. Schifeling, J. 2024. "The No. 1 AI mistake job seekers make, from a career expert: So many people use ChatGPT ‘in exactly the wrong way’." CNBC, March 18. Accessed 2024-05-25.
  72. Taori, R., I. Gulrajani, T. Zhang, Y. Dubois, and X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto. 2023. "Alpaca: A Strong, Replicable Instruction-Following Model." CRFM, Stanford University, March 13. Accessed 2024-05-23.
  73. Tovar, A. D. 2023. "Supposed leak of GPT4 architecture." LinkedIn Pulse, July 11. Accessed 2024-05-25.
  74. Varshney, T. 2023. "Introduction to LLM Agents." Blog, NVIDIA, November 30. Accessed 2024-05-22.
  75. Vaswani, A., Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. "Attention Is All You Need." v5, arXiv, December 06. Accessed 2024-05-25.
  76. Vrdoljak, J. 2023. "Primer on Large Language Models (from scaling laws to prompting and emergent capabilities)." On Medium, May 18. Accessed 2024-05-24.
  77. Weng, Lilian. 2018. "Attention? Attention!" Lil'Log, June 24. Accessed 2024-05-24.
  78. Wolfe, C. R. 2024. "Supervised Fine-Tuning (SFT) with Large Language Models." Towards Data Science, on Medium, January 17. Accessed 2024-05-22.
  79. Wu, S., Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. "BloombergGPT: A Large Language Model for Finance." v3, arXiv, December 21. Accessed 2024-05-23.
  80. Xu, F. F., Uri Alon, Graham Neubig, and Vincent J. Hellendoorn. 2022. "A Systematic Evaluation of Large Language Models of Code." v3, arXiv, May 4. Accessed 2024-05-23.
  81. Yan, E. 2024. "Open LLMs." On GitHub, May 25. Accessed 2024-05-25.
  82. Zhao, W. X., K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J. Nie and J.-R. Wen. 2023. "A Survey of Large Language Models." v13, arXiv, November 24. Accessed 2024-05-21.
  83. Zhao, H., Zhengliang Liu, Zihao Wu, Yiwei Li, Tianze Yang, Peng Shu, Shaochen Xu, Haixing Dai, Lin Zhao, Gengchen Mai, Ninghao Liu, and Tianming Liu. 2024. "Revolutionizing Finance with LLMs: An Overview of Applications and Insights." v1, arXiv, January 22. Accessed 2024-05-25.

Further Reading

  1. Liu, Y., Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, and Bao Ge. 2024a. "Understanding LLMs: A Comprehensive Overview from Training to Inference." v2, arXiv, January 6. Accessed 2024-05-21.
  2. Wan, Z., Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, and Mi Zhang. 2024. "Efficient Large Language Models: A Survey." v3, arXiv, January 31. Accessed 2024-05-21.
  3. Minaee, S., Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. 2024. "Large Language Models: A Survey." v2, arXiv, February 20. Accessed 2024-05-21.
  4. Dong, Q., Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, and Zhifang Sui. 2023. "A Survey on In-context Learning." v3, arXiv, June 1. Accessed 2024-05-21.
  5. Xu, L., Haoran Xie, Si-Zhao Joe Qin, Xiaohui Tao, and Fu Lee Wang. 2023. "Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment." v1, arXiv, December 19. Accessed 2024-05-21.
  6. Gao, Y., Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. 2024. "Retrieval-Augmented Generation for Large Language Models: A Survey." v5, arXiv, March 27. Accessed 2024-05-21.

Article Stats

Author-wise Stats for Article Edits

No. of Edits
No. of Chats

Cite As

Devopedia. 2024. "Large Language Model." Version 3, May 26. Accessed 2024-05-26.
Contributed by
1 author

Last updated on
2024-05-26 10:41:20
  • LLM App
  • LLMs for Code
  • LLM Evaluation Metrics
  • LLM Hallucination
  • Prompt Engineering
  • Generative Artificial Intelligence