I’ve been a data engineer for years. Pipelines, warehouses, ETL - the unsexy infrastructure work that makes everything else possible.

Now I’m learning AI. Not because it’s trendy, but because I see where the industry is heading, and I think data engineers are better positioned for this transition than we realize.

The Data Engineer’s Advantage

Here’s what most AI tutorials don’t tell you: AI systems are data systems first.

Every RAG application needs:

  • Clean, structured data pipelines
  • Reliable ingestion and transformation
  • Monitoring and data quality checks
  • Scalable storage and retrieval

Sound familiar? It should. This is what we do.

The difference is the destination. Instead of feeding dashboards, we’re feeding models. Instead of SQL queries, we’re dealing with embeddings and vector similarity. But the principles of reliable data infrastructure don’t change.

What I’m Learning

I’m currently doing an MTech in Data Engineering with specializations in:

  • Generative AI - Understanding how LLMs work, not just how to call APIs
  • Reinforcement Learning - Because I want to understand the full landscape

But formal education is just part of it. I’m also:

  • Building projects to learn by doing
  • Writing about what I learn (this blog)
  • Connecting my existing knowledge to new concepts

My Learning Roadmap

Here’s how I’m approaching this transition:

Phase 1: Foundations (Current)

  • LLM fundamentals - tokenization, attention, transformers
  • Vector databases and embeddings
  • RAG architecture patterns

Phase 2: Building

  • Build a RAG system end-to-end
  • Experiment with different retrieval strategies
  • Understand evaluation and metrics

Phase 3: Production

  • ML pipelines and MLOps
  • Monitoring AI systems in production
  • Cost optimization and scaling

Why I’m Writing About This

Two reasons:

  1. Accountability - Learning in public keeps me honest
  2. Helping others - If you’re a data engineer thinking about AI, I want to share what I learn so you don’t have to figure it all out alone

I’m not an expert. I’m documenting the journey.

What’s Next

My next post will be about building my first RAG system - what worked, what broke, and what I learned from a data engineering perspective.

If you’re on a similar path, I’d love to hear from you.


This is the first post in my “From Data to Intelligence” series. Follow along as I document the transition from data engineering to AI engineering.