I’ve been a data engineer for years. Pipelines, warehouses, ETL - the unsexy infrastructure work that makes everything else possible.
Now I’m learning AI. Not because it’s trendy, but because I see where the industry is heading, and I think data engineers are better positioned for this transition than we realize.
The Data Engineer’s Advantage
Here’s what most AI tutorials don’t tell you: AI systems are data systems first.
Every RAG application needs:
- Clean, structured data pipelines
- Reliable ingestion and transformation
- Monitoring and data quality checks
- Scalable storage and retrieval
Sound familiar? It should. This is what we do.
The difference is the destination. Instead of feeding dashboards, we’re feeding models. Instead of SQL queries, we’re dealing with embeddings and vector similarity. But the principles of reliable data infrastructure don’t change.
What I’m Learning
I’m currently doing an MTech in Data Engineering with specializations in:
- Generative AI - Understanding how LLMs work, not just how to call APIs
- Reinforcement Learning - Because I want to understand the full landscape
But formal education is just part of it. I’m also:
- Building projects to learn by doing
- Writing about what I learn (this blog)
- Connecting my existing knowledge to new concepts
My Learning Roadmap
Here’s how I’m approaching this transition:
Phase 1: Foundations (Current)
- LLM fundamentals - tokenization, attention, transformers
- Vector databases and embeddings
- RAG architecture patterns
Phase 2: Building
- Build a RAG system end-to-end
- Experiment with different retrieval strategies
- Understand evaluation and metrics
Phase 3: Production
- ML pipelines and MLOps
- Monitoring AI systems in production
- Cost optimization and scaling
Why I’m Writing About This
Two reasons:
- Accountability - Learning in public keeps me honest
- Helping others - If you’re a data engineer thinking about AI, I want to share what I learn so you don’t have to figure it all out alone
I’m not an expert. I’m documenting the journey.
What’s Next
My next post will be about building my first RAG system - what worked, what broke, and what I learned from a data engineering perspective.
If you’re on a similar path, I’d love to hear from you.
This is the first post in my “From Data to Intelligence” series. Follow along as I document the transition from data engineering to AI engineering.