Prompt Version Control: Treating Prompts Like Production Code

Your prompts are production code. Treat them accordingly.

I’ve seen teams store prompts in Slack messages, Google Docs, and even Post-it notes. When something breaks in production, the debugging conversation goes like this:

“What prompt are we using?” “The one Sarah updated last week.” “Which version?” “…I think it’s in the shared drive somewhere?”

This is madness. We solved this problem for application code decades ago with version control. It’s time to apply the same discipline to prompts.

The Problem: Prompt Drift

Prompts evolve constantly. You tweak the system message, add few-shot examples, adjust the temperature guidance. Each change affects output quality, cost, and latency.

Without version control, you lose:

Reproducibility: You can’t recreate last Tuesday’s results
Rollback capability: A bad prompt change means scrambling to remember the old version
Audit trail: No record of who changed what or why
Environment separation: Production and development use whatever someone last copy-pasted

The Solution: A Prompt Registry

Think of it like a schema registry for your prompts. Every prompt has:

A unique name
An ordered list of immutable versions
Labels pointing to specific versions (like production, staging, development)

Here’s the core data model:

from pydantic import BaseModel, Field
from typing import List, Dict, Optional
from datetime import datetime

class PromptVersion(BaseModel):
    """A specific version of a prompt."""
    version: int
    prompt_text: str
    created_at: datetime = Field(default_factory=datetime.now)
    author: Optional[str] = None
    metadata: Dict[str, Any] = Field(default_factory=dict)

class Prompt(BaseModel):
    """A prompt with multiple versions."""
    name: str
    versions: List[PromptVersion] = Field(default_factory=list)
    labels: Dict[str, int] = Field(default_factory=dict)

    def get_version_by_label(self, label: str) -> Optional[PromptVersion]:
        """Get a specific version by its label."""
        if label not in self.labels:
            return None
        version_number = self.labels[label]
        return self.get_version_by_number(version_number)

The key insight: versions are immutable, labels are mutable. Version 3 always contains the same text. But production can point to version 3 today and version 2 tomorrow.

The Registry Interface

The registry provides a simple API:

class PromptRegistry(ABC):
    @abstractmethod
    def create_prompt(self, name: str) -> Prompt:
        """Create a new prompt."""
        pass

    @abstractmethod
    def add_version(self, name: str, prompt_text: str,
                    author: Optional[str] = None) -> PromptVersion:
        """Add a new version to a prompt."""
        pass

    @abstractmethod
    def set_label(self, name: str, label: str, version: int) -> None:
        """Set a label to point to a specific version."""
        pass

    @abstractmethod
    def get_prompt(self, name: str) -> Optional[Prompt]:
        """Get a prompt by name."""
        pass

This abstraction lets you swap backends. Start with a local JSON file for development. Move to PostgreSQL or Redis for production. The interface stays the same.

Usage in Practice

Here’s a typical workflow:

# Initialize the registry
registry = PromptRegistryFactory.create_from_config()

# Create a new prompt and add versions
registry.create_prompt("movie-critic")

v1 = registry.add_version(
    "movie-critic",
    "As a movie critic, do you like {{movie}}?",
    author="alice"
)

v2 = registry.add_version(
    "movie-critic",
    "As a {{criticlevel}} movie critic, do you like {{movie}}?",
    author="bob"
)

v3 = registry.add_version(
    "movie-critic",
    "You are a {{criticlevel}} movie critic. Write a review for {{movie}}.",
    author="alice"
)

# Set up environment labels
registry.set_label("movie-critic", "staging", 2)
registry.set_label("movie-critic", "production", 3)

Now your application code becomes environment-aware:

def get_movie_review(movie: str, critic_level: str = "professional"):
    prompt = registry.get_prompt("movie-critic")
    version = prompt.get_version_by_label(os.environ.get("ENV", "development"))

    template = version.prompt_text
    filled = template.replace("{{movie}}", movie)
    filled = filled.replace("{{criticlevel}}", critic_level)

    return call_llm(filled)

Rollback in Seconds

Something goes wrong in production? One line:

registry.set_label("movie-critic", "production", 2)

No code deployment. No PR review. Instant rollback to a known good state.

Data Engineering Patterns Applied

This system applies several patterns we use daily:

Schema Registry: Just as Kafka schema registries manage Avro/Protobuf schemas, the prompt registry manages prompt schemas. Both provide a single source of truth with versioning.

Immutable Infrastructure: Versions never change once created. This guarantees reproducibility and simplifies debugging.

Blue-Green Deployments: Labels enable instant switching between versions. Point production to version 3, test it, then point it back to version 2 if needed.

Configuration as Code: Prompts are configuration. They should be versioned, reviewed, and deployed with the same rigor as application code.

Extending the System

The base implementation uses local JSON storage. For production, extend it:

class PostgresRegistry(PromptRegistry):
    """Store prompts in PostgreSQL for durability and querying."""

    def _load(self):
        # Load from database
        pass

    def _save(self):
        # Write to database with transaction
        pass

You can also integrate with CI/CD:

# .github/workflows/prompt-deploy.yml
on:
  push:
    paths:
      - 'prompts/**'

jobs:
  deploy-prompts:
    runs-on: ubuntu-latest
    steps:
      - name: Update prompt registry
        run: python scripts/update_prompts.py

What’s Next

In the next post, we’ll build a complete RAG pipeline with Apache Airflow. We’ll use this prompt registry to manage the prompts for our retrieval and generation steps.

The code for this project is available at GitHub.

This is Part 2 of the “Data Engineering Meets AI” series. Read Part 1: RAG is Just ETL