Total Pages
Chapters
Lecture Videos
Lecture Slides
Join the NPTEL Course on
Large Language Models
Gain in-depth knowledge and practical skills in Large Language Models through our expertly designed course. Enroll now to access exclusive content, hands-on projects, and personalized mentorship.
Course Highlights:⏰ 12 weeks of expert-led content
🤖 Real-world applications in AI and NLP
🏅 Optional certification after successful exam
Enroll Now
Book Overview
Below is a detailed overview of the key chapters that form the foundation of this book, guiding readers through the essential concepts and advanced topics in Large Language Models.
-
▶Chapter 01. Introduction
- 1.1 What Is a Language Model?
- 1.2 Evolution of Language Modelling Technologies
- 1.3 Scaling Laws in Language Model
-
▶1.4 Evolution of LLMs
- 1.4.1 The Emergence and Development of LLMs
- 1.4.2 Implications of Encoder-Decoder in LLM Development
- 1.4.3 Optimising Scale and Resource Efficiency in LLMs
- 1.5 Organisation of the Book
-
▶Chapter 02. An Overview of Natural Language Processing and Neural Networks
-
▶Part I: Natural Language Processing
- 2.1 Computational Linguistics and Natural Language Processing
- 2.2 Overview of the Natural Language Processing Pipeline
-
▶2.3 Morphology
- 2.3.1 Morphemes
- 2.3.2 Stemming
- 2.3.3 Lemmatisation
- 2.3.4 Lexicon
-
▶2.4 Tokenisation
- 2.4.1 Advanced Techniques: Subword Tokenisation
- 2.5 Syntactics
- 2.6 Semantics
- 2.7 Introduction to Language Modelling
-
▶Part II: Neural Networks
-
▶2.8 The Perceptron
- 2.8.1 Definition
- 2.8.2 Implementing AND, OR, and XOR Logic
-
▶2.9 Multilayer Perceptron
- 2.9.1 Neural Networks
- 2.9.2 Types of Activation Functions
-
▶2.10 Training Neural Networks
- 2.10.1 Backpropagation
- 2.10.2 Batching
- 2.10.3 Hyperparameters
- 2.10.4 Regularisation
- 2.11 Vanishing and Exploding Gradients
- 2.12 Evaluation Metrics
- 2.13 Summary
-
-
-
▶ Chapter 03. Word Embedding
- 3.1 Distributional Hypothesis
-
▶ 3.2 Vector Semantics
- 3.2.1 Defining and Measuring Semantic Similarity
-
▶ 3.3 Types of Word Embedding
- 3.3.1 Frequency-Based Embeddings
- 3.3.2 Word2Vec
- 3.3.3 Global Vectors for Word Representation
- 3.3.4 FastText
- 3.4 Bias in Word Embedding
- 3.5 Limitations of Word Embedding Methods
- 3.6 Applications of Word Embeddings
- 3.7 Summary
-
▶ Chapter 04. Statistical Language Model
-
▶ 4.1 Statistical Language Model
- 4.1.1 The Conditional Probability
- 4.1.2 The Chain Rule of Probability
- 4.1.3 The Markov Assumption
- 4.1.4 Unigram Language Model
- 4.1.5 Bigram Language Model
-
▶ 4.2 Smoothing
- 4.2.1 The Unknown Tokens
- 4.2.2 Smoothing
- 4.2.3 Back-Off
- 4.2.4 Interpolation
- 4.2.5 Good-Turing
-
▶ 4.3 Evaluation of Language Model
- 4.3.1 Extrinsic Evaluation
- 4.3.2 Intrinsic Evaluation
- 4.3.3 Human Evaluation
- 4.3.4 Evaluation Metrics
- 4.3.5 Benchmark Suits
- 4.4 Limitations of Statistical Language Models
- 4.5 Summary
-
-
▶ Chapter 05. Neural Language Models
-
▶ 5.1 Convolutional Neural Networks
- 5.1.1 Components of CNNs: Kernel, Stride, Pooling and Padding
- 5.1.2 Hierarchical and Dilated Convolutions
- 5.1.3 Applications of CNNs in NLP
-
▶ 5.2 Recurrent Neural Networks
- 5.2.1 Training RNNs
- 5.2.2 Applications of RNNs
- 5.2.3 Challenges in Sequence Modelling
- 5.2.4 RNN Variants: LSTM, GRU, and Bidirectional RNNs
-
▶ 5.3 Sequence-to-Sequence Models
- 5.3.1 Training Sequence-to-Sequence Models
- 5.3.2 Inference Decoding
- 5.3.3 Applications of Sequence-to-Sequence Models
-
▶ 5.4 Attention Mechanisms
- 5.4.1 Introduction to Attention
- 5.4.2 Advantages of Attention
- 5.4.3 Variants of Attention
- 5.5 Limitations of Neural Language Models
- 5.6 Summary
-
-
▶ Chapter 06. Transformers
-
▶ 6.1 Self-Attention
- 6.1.1 Multi-Head Self-Attention
-
▶ 6.2 Transformer Encoder Block
- 6.2.1 Components of the Transformer Encoder Block
- 6.2.2 Feed-Forward Neural Network
- 6.2.3 Layer Normalisation
- 6.2.4 Residual Connections
-
▶ 6.3 Transformer Decoder Block
- 6.3.1 Masked Multi-Head Self-Attention
- 6.3.2 Cross-Attention (Encoder-Decoder Attention)
-
▶ 6.4 Positional Embeddings
- 6.4.1 Types of Positional Embeddings
- 6.4.2 Rotary Position Embedding
-
▶ 6.5 Efficient Attention Mechanisms
- 6.5.1 KV Caching in Multi-Head Self-Attention
- 6.5.2 Multi-Query Attention
- 6.5.3 Grouped-Query Attention
- 6.5.4 Sliding Window Attention
-
▶ 6.6 An Alternate Formulation of Transformers
- 6.6.1 Residual Stream Perspective of Transformers
- 6.6.2 Attention Heads: Reading and Writing
- 6.6.3 Feed-Forward Networks: Transformation of Residual Streams
- 6.6.4 Prediction Head: Generating the Next Token
- 6.6.5 Decomposing the Transformer: Attention and Feed-Forward Contributions
- 6.6.6 Residual Networks as Shallow Ensembles
- 6.6.7 Interpreting the Mechanism of LLMs
- 6.7 Summary
-
-
▶ Chapter 07. Language Model Pretraining
-
▶ 7.1 Embeddings from Language Model
- 7.1.1 Architecture and Training of ELMo
- 7.1.2 Applications of ELMo
- 7.1.3 Limitations of ELMo
- 7.2 Evaluation Datasets
-
▶ 7.3 Encoder-Based Pretraining
- 7.3.1 Fundamentals of Encoder-Based Models
- 7.3.2 Training Paradigm
- 7.3.3 BERT Pretraining
- 7.3.4 Applications and Limitations
-
▶ 7.4 Decoder-Based Pretraining
- 7.4.1 Decoder-Based Architecture
- 7.4.2 Training Paradigm
- 7.4.3 GPT Pretraining
- 7.4.4 Applications and Limitations
-
▶ 7.5 Encoder-Decoder Based Pretraining
- 7.5.1 Architecture
- 7.5.2 Joint Pretraining Strategy
- 7.5.3 T5 Pretraining
- 7.5.4 Applications and Limitations
- 7.6 Emergence of Large Language Models
- 7.7 Limitations of Pretraining
- 7.8 Summary
-
-
▶ Chapter 08. Fine-Tuning and Alignment of LLMs
- 8.1 Moving from Pretraining to Fine-Tuning
-
▶ 8.2 Fine-Tuning on Various Task-Specific Applications
- 8.2.1 Sequence Classification
- 8.2.2 Pairwise Sequence Classification
- 8.2.3 Sequence Labelling
- 8.2.4 Learning Spans
- 8.2.5 Challenges in Classical Fine-Tuning Methods
- 8.3 Instruction Tuning
-
▶ 8.4 Alignment Methods
- 8.4.1 Reinforcement Learning from Human Feedback
- 8.4.2 Direct Preference Optimisation
- 8.5 Summary
-
▶ Chapter 09. Prompting Strategies in LLMs
-
▶ 9.1 Prompt Engineering
- 9.1.1 Prompt Shape
- 9.1.2 Manual Template Engineering
- 9.1.3 Automated Template Learning
- 9.1.4 Continuous Prompts
-
▶ 9.2 Prompt Application
- 9.2.1 In-Context Learning
- 9.2.2 Knowledge Probing
- 9.2.3 Classification-Based Tasks
- 9.2.4 Information Extraction
- 9.2.5 Reasoning in Natural Language Processing
- 9.2.6 Question Answering
- 9.2.7 Text Generation
- 9.2.8 Automatic Evaluation of Text Generation
- 9.3 Chain-of-Thoughts
- 9.4 Tree-of-Thoughts
- 9.5 Graph-of-Thoughts
- 9.6 Summary
-
-
▶ Chapter 10. Efficient Methods for Fine-Tuning LLMs
-
▶ 10.1 Model Compression with Knowledge Distillation
- 10.1.1 White-Box Knowledge Distillation
- 10.1.2 Meta Knowledge Distillation
- 10.1.3 Black-Box Knowledge Distillation
-
▶ 10.2 Model Compression Techniques
- 10.2.1 Model Pruning
- 10.2.2 Model Quantisation
-
▶ 10.3 Parameter-Efficient Fine-Tuning
- 10.3.1 Adapters
- 10.3.2 Prefix Tuning
- 10.3.3 Prompt Tuning
- 10.3.4 Selective PEFT Techniques
- 10.3.5 Reparameterisation-Based PEFT Techniques
- 10.3.6 Hybrid Approaches for Efficient Fine-Tuning
-
▶ 10.4 ★Efficient Strategies for Fine-Tuning LLMs
- 10.4.1 Mixed-Precision Tuning
- 10.4.2 Data Selection for Efficient Fine-Tuning
- 10.4.3 Prompt Compression
- 10.5 Summary
-
-
▶ Chapter 11. Augmented Large Language Models
-
▶ 11.1 Retrieval-Augmented Generation
- 11.1.1 Indexing in RAGs
- 11.1.2 Context Searching in RAGs
- 11.1.3 Prompting in RAGs
- 11.1.4 Inferencing in RAGs
- 11.1.5 Comparison of RAGs with LLMs
-
▶ 11.2 Evaluation of RAGs
- 11.2.1 Assessing of Retrieval Quality
- 11.2.2 Generation Quality
- 11.2.3 Knowledge Integration and Factuality Evaluation
- 11.2.4 Response Time and Efficiency
- 11.2.5 User Satisfaction
- 11.2.6 RAGAs Framework for RAG Evaluation
-
▶ 11.3 Tool Calling with LLMs
- 11.3.1 Autonomously Determining Which Tools to Use and Where
- 11.3.2 Examples of Different Tools
- 11.3.3 Evaluation of Code Generation Capabilities of Agents
- 11.3.4 Error Handling and Optimisation
-
▶ 11.4 LLM Augmentation with Agents
- 11.4.1 Reasoning in LLM Agents
- 11.4.2 Planning in LLM Agents
- 11.4.3 Handling Memory in LLM Agents
- 11.5 Summary
-
-
▶ Chapter 12. Multilingual and Multimodal LLMs
-
▶ 12.1 Multilingual Language Models
- 12.1.1 The Evolution of Multilingual NLP
- 12.1.2 The Need for Multilingual LLMs
- 12.1.3 Cross-Lingual Representation Learning
- 12.1.4 Applications
-
▶ 12.2 Multimodal Language Models
- 12.2.1 Integration of Diverse Modalities
- 12.2.2 Applications
-
▶ 12.3 Training Multilingual and Multimodal LLMs
- 12.3.1 Efficient Data Collection and Preprocessing
- 12.3.2 Model Training Strategies
-
▶ 12.4 Addressing Challenges in Multilingual and Multimodal LLMs
- 12.4.1 Challenges in Multilingual LLMs
- 12.4.2 Challenges in Multimodal LLMs
- 12.5 Future Directions and Emerging Trends
- 12.6 Limitations of Multilingual and Multimodal LLMs
- 12.7 Summary
-
-
▶ Chapter 13. Responsible LLMs
- 13.1 Inaccurate, Inappropriate, and Unethical Behaviour of LLMs
- 13.2 Responsible AI
-
▶ 13.3 Bias
- 13.3.1 Visibility of Bias
- 13.3.2 Source of Bias
- 13.4 Bias Mitigation
- 13.5 Summary
-
▶ Chapter 14. Advanced Topics in Large Language Models
-
▶ 14.1 Reasoning with LLMs
- 14.1.1 Advancements in Reasoning Capabilities
- 14.1.2 Challenges in Reasoning with LLMs
- 14.1.3 Types of Reasoning Tasks
- 14.1.4 How Do LLMs Approach Reasoning?
- 14.1.5 Evaluating Reasoning Abilities in LLMs
-
▶ 14.2 Handling Long Context in LLMs
- 14.2.1 Challenges in Processing Long Context
- 14.2.2 Training and Fine-Tuning Approaches to Extend Context Length
- 14.2.3 Evaluation of Long-Context LLMs
-
▶ 14.3 Model Editing
- 14.3.1 Conditions for Successful Editing
- 14.3.2 Methods for Model Editing
- 14.3.3 Metrics for Evaluation of Model Editing
-
▶ 14.4 Hallucination in LLMs
- 14.4.1 Definition
- 14.4.2 Sources of Hallucination
- 14.4.3 Metrics Measuring Hallucination
- 14.4.4 Hallucination Mitigation
-
▶ 14.5 Self-Evolving LLMs
- 14.5.1 Conceptual Framework
- 14.5.2 Evolution Objectives and Techniques
- 14.5.3 Challenges
- 14.6 Summary
-
-
▶ Chapter 15. LLMs in Action
-
▶ 15.1 An Overview of the Landscape
- 15.1.1 Tracing the Evolution and Importance of LLMs in Contemporary AI
- 15.1.2 Open-Source vs Closed-Source Paradigms: Benefits and Trade-offs
-
▶ 15.2 A Panoramic View of LLMs
- 15.2.1 General-Purpose Large Language Models
- 15.2.2 Language-Specific LLMs
- 15.2.3 Domain-Specific LLMs
- 15.2.4 Task-Specific LLMs
-
▶ 15.3 Diverse Applications of LLMs
- 15.3.1 Healthcare: Enhancing Diagnostics and Patient Care
- 15.3.2 Finance: Transforming Data Analysis and Risk Management
- 15.3.3 Legal: Streamlining Research and Case Management
- 15.3.4 Education: Personalised Learning and Academic Support
-
▶ 15.4 Emerging Trends and Future Directions in LLMs
- 15.4.1 Beyond Text: The Advent of Multimodal LLMs
- 15.4.2 Autonomous Agents: The LLM Leap in AI Evolution (AutoGPT)
- 15.5 Summary
-
Check Out the Lectures on YouTube
Explore a collection of step-by-step video lectures on YouTube to enhance your understanding of Large Language Models and related concepts.