Skip to main content
Saurabh Ghatnekar

90-Day Foundation Models & Gen AI Study Roadmap

I’m working through this comprehensive roadmap to deeply understand foundation models - from attention mechanisms to diffusion models to modern LLMs.

Time Commitment: 3-4 hours/day (weekdays), 5-6 hours (weekends) Total Hours: ~320 hours


Why This Roadmap?

As a data engineer transitioning into AI, I wanted a structured path that goes beyond surface-level tutorials. This roadmap covers the foundational concepts that power today’s AI systems, with a focus on actually implementing things from scratch.


Phase Overview

PhaseWeeksFocus AreaHours
Phase 11-3Transformers & Attention Mechanisms~70 hrs
Phase 24-5Diffusion Models~50 hrs
Phase 36-7Foundation Model Architectures (Language)~50 hrs
Phase 48-9Vision Transformers & VL Models~50 hrs
Phase 510-11Training & Inference Optimization~50 hrs
Phase 612-13Evaluation, Ethics & Advanced Topics~50 hrs

Phase 1: Transformers & Attention (Weeks 1-3)

Week 1: Attention Fundamentals

Days 1-2: Attention Origins

  • Encoder-decoder architecture limitations
  • Bahdanau & Luong attention mechanisms

Days 3-4: The Transformer Architecture

  • Multi-head self-attention
  • Positional encodings
  • Layer normalization & residual connections

Days 5-7: Hands-on Implementation

  • Implement scaled dot-product attention from scratch
  • Build multi-head attention module
  • Create a minimal transformer encoder

Key Resources:

Week 2: Advanced Attention

  • Self-attention deep dive & interpretability
  • Cross-attention mechanisms
  • Efficient attention variants (Longformer, BigBird)

Week 3: Perceiver & Non-Parametric Transformers

  • Handling arbitrary input modalities
  • In-context learning as implicit Bayesian inference

Checkpoint: Build a multi-modal classifier using Perceiver-style architecture


Phase 2: Diffusion Models (Weeks 4-5)

Week 4: Diffusion Fundamentals

  • Forward & reverse diffusion processes
  • Score matching perspective
  • DDPM & DDIM sampling

Key Resources:

Week 5: Latent Diffusion

  • VAE latent space compression
  • Stable Diffusion architecture
  • ControlNet & conditional generation

Checkpoint: Fine-tune a latent diffusion model using LoRA


Phase 3: Language Model Architectures (Weeks 6-7)

Week 6: GPT & Decoder-Only Models

  • GPT-1 through GPT-4 evolution
  • LLaMA, Mistral, Mixture of Experts

Week 7: BERT, T5 & State Space Models

  • Masked vs Causal Language Modeling
  • S4 & Mamba architectures

Checkpoint: Compare BERT-style MLM vs GPT-style CLM


Phase 4: Vision-Language Models (Weeks 8-9)

Week 8: Vision Transformers

  • ViT patch embeddings
  • DeiT training recipes
  • MLP-Mixer & ConvNeXt

Week 9: VL Models

  • CLIP & contrastive learning
  • Flamingo, BLIP-2, LLaVA architectures

Checkpoint: Build CLIP-based image-text retrieval system


Phase 5: Training & Inference (Weeks 10-11)

Week 10: Pre-training & Fine-tuning

  • Pre-training objectives (CLM, MLM, contrastive)
  • Instruction tuning, RLHF, DPO

Week 11: PEFT & Efficient Inference

  • LoRA, QLoRA, adapters
  • Flash Attention, KV-cache, quantization
  • RAG implementation

Checkpoint: Build RAG system with QLoRA-adapted generator


Phase 6: Evaluation & Ethics (Weeks 12-13)

Week 12: Benchmarks

  • MMLU, HellaSwag, HumanEval
  • Hallucination detection

Week 13: Safety & Advanced Topics

  • Bias & fairness
  • Privacy & memorization
  • Prompt engineering patterns

Follow Along

I’ll be documenting my progress through this roadmap on this blog. Sign up below to get weekly updates on what I’m learning, code implementations, and insights from the papers I’m reading.


Key Resources

Courses

Blogs

Code