90-Day Foundation Models & Gen AI Roadmap
A structured 90-day study plan to master transformers, diffusion models, and large language models. Join me as I work through this roadmap and share my learnings.
90-Day Foundation Models & Gen AI Study Roadmap
I’m working through this comprehensive roadmap to deeply understand foundation models - from attention mechanisms to diffusion models to modern LLMs.
Time Commitment: 3-4 hours/day (weekdays), 5-6 hours (weekends) Total Hours: ~320 hours
Join the Study Group
Get weekly progress updates, paper summaries, and code implementations delivered to your inbox. Learn alongside me.
Why This Roadmap?
As a data engineer transitioning into AI, I wanted a structured path that goes beyond surface-level tutorials. This roadmap covers the foundational concepts that power today’s AI systems, with a focus on actually implementing things from scratch.
Phase Overview
| Phase | Weeks | Focus Area | Hours |
|---|---|---|---|
| Phase 1 | 1-3 | Transformers & Attention Mechanisms | ~70 hrs |
| Phase 2 | 4-5 | Diffusion Models | ~50 hrs |
| Phase 3 | 6-7 | Foundation Model Architectures (Language) | ~50 hrs |
| Phase 4 | 8-9 | Vision Transformers & VL Models | ~50 hrs |
| Phase 5 | 10-11 | Training & Inference Optimization | ~50 hrs |
| Phase 6 | 12-13 | Evaluation, Ethics & Advanced Topics | ~50 hrs |
Phase 1: Transformers & Attention (Weeks 1-3)
Week 1: Attention Fundamentals
Days 1-2: Attention Origins
- Encoder-decoder architecture limitations
- Bahdanau & Luong attention mechanisms
Days 3-4: The Transformer Architecture
- Multi-head self-attention
- Positional encodings
- Layer normalization & residual connections
Days 5-7: Hands-on Implementation
- Implement scaled dot-product attention from scratch
- Build multi-head attention module
- Create a minimal transformer encoder
Key Resources:
- Attention Is All You Need (Vaswani et al., 2017)
- The Illustrated Transformer
- Andrej Karpathy: Let’s build GPT
Week 2: Advanced Attention
- Self-attention deep dive & interpretability
- Cross-attention mechanisms
- Efficient attention variants (Longformer, BigBird)
Week 3: Perceiver & Non-Parametric Transformers
- Handling arbitrary input modalities
- In-context learning as implicit Bayesian inference
Checkpoint: Build a multi-modal classifier using Perceiver-style architecture
Phase 2: Diffusion Models (Weeks 4-5)
Week 4: Diffusion Fundamentals
- Forward & reverse diffusion processes
- Score matching perspective
- DDPM & DDIM sampling
Key Resources:
Week 5: Latent Diffusion
- VAE latent space compression
- Stable Diffusion architecture
- ControlNet & conditional generation
Checkpoint: Fine-tune a latent diffusion model using LoRA
Phase 3: Language Model Architectures (Weeks 6-7)
Week 6: GPT & Decoder-Only Models
- GPT-1 through GPT-4 evolution
- LLaMA, Mistral, Mixture of Experts
Week 7: BERT, T5 & State Space Models
- Masked vs Causal Language Modeling
- S4 & Mamba architectures
Checkpoint: Compare BERT-style MLM vs GPT-style CLM
Phase 4: Vision-Language Models (Weeks 8-9)
Week 8: Vision Transformers
- ViT patch embeddings
- DeiT training recipes
- MLP-Mixer & ConvNeXt
Week 9: VL Models
- CLIP & contrastive learning
- Flamingo, BLIP-2, LLaVA architectures
Checkpoint: Build CLIP-based image-text retrieval system
Phase 5: Training & Inference (Weeks 10-11)
Week 10: Pre-training & Fine-tuning
- Pre-training objectives (CLM, MLM, contrastive)
- Instruction tuning, RLHF, DPO
Week 11: PEFT & Efficient Inference
- LoRA, QLoRA, adapters
- Flash Attention, KV-cache, quantization
- RAG implementation
Checkpoint: Build RAG system with QLoRA-adapted generator
Phase 6: Evaluation & Ethics (Weeks 12-13)
Week 12: Benchmarks
- MMLU, HellaSwag, HumanEval
- Hallucination detection
Week 13: Safety & Advanced Topics
- Bias & fairness
- Privacy & memorization
- Prompt engineering patterns
Follow Along
I’ll be documenting my progress through this roadmap on this blog. Sign up below to get weekly updates on what I’m learning, code implementations, and insights from the papers I’m reading.
Weekly updates on my progress. No spam.
Key Resources
Courses
- Stanford CS224N - NLP with Deep Learning
- Stanford CS324 - Large Language Models
- Hugging Face NLP Course
Blogs
- Jay Alammar - Visual explanations
- Lilian Weng - Comprehensive surveys
- Transformer Circuits - Mechanistic interpretability
Code
- nanoGPT - Minimal GPT
- Hugging Face Transformers
- vLLM - Fast inference