Your prompts are production code. Treat them accordingly.
I’ve seen teams store prompts in Slack messages, Google Docs, and even Post-it notes. When something breaks in production, the debugging conversation goes like this:
“What prompt are we using?” “The one Sarah updated last week.” “Which version?” “…I think it’s in the shared drive somewhere?”
This is madness. We solved this problem for application code decades ago with version control. It’s time to apply the same discipline to prompts.
The Problem: Prompt Drift
Prompts evolve constantly. You tweak the system message, add few-shot examples, adjust the temperature guidance. Each change affects output quality, cost, and latency.
Without version control, you lose:
- Reproducibility: You can’t recreate last Tuesday’s results
- Rollback capability: A bad prompt change means scrambling to remember the old version
- Audit trail: No record of who changed what or why
- Environment separation: Production and development use whatever someone last copy-pasted
The Solution: A Prompt Registry
Think of it like a schema registry for your prompts. Every prompt has:
- A unique name
- An ordered list of immutable versions
- Labels pointing to specific versions (like
production,staging,development)
Here’s the core data model:
from pydantic import BaseModel, Field
from typing import List, Dict, Optional
from datetime import datetime
class PromptVersion(BaseModel):
"""A specific version of a prompt."""
version: int
prompt_text: str
created_at: datetime = Field(default_factory=datetime.now)
author: Optional[str] = None
metadata: Dict[str, Any] = Field(default_factory=dict)
class Prompt(BaseModel):
"""A prompt with multiple versions."""
name: str
versions: List[PromptVersion] = Field(default_factory=list)
labels: Dict[str, int] = Field(default_factory=dict)
def get_version_by_label(self, label: str) -> Optional[PromptVersion]:
"""Get a specific version by its label."""
if label not in self.labels:
return None
version_number = self.labels[label]
return self.get_version_by_number(version_number)
The key insight: versions are immutable, labels are mutable. Version 3 always contains the same text. But production can point to version 3 today and version 2 tomorrow.
The Registry Interface
The registry provides a simple API:
class PromptRegistry(ABC):
@abstractmethod
def create_prompt(self, name: str) -> Prompt:
"""Create a new prompt."""
pass
@abstractmethod
def add_version(self, name: str, prompt_text: str,
author: Optional[str] = None) -> PromptVersion:
"""Add a new version to a prompt."""
pass
@abstractmethod
def set_label(self, name: str, label: str, version: int) -> None:
"""Set a label to point to a specific version."""
pass
@abstractmethod
def get_prompt(self, name: str) -> Optional[Prompt]:
"""Get a prompt by name."""
pass
This abstraction lets you swap backends. Start with a local JSON file for development. Move to PostgreSQL or Redis for production. The interface stays the same.
Usage in Practice
Here’s a typical workflow:
# Initialize the registry
registry = PromptRegistryFactory.create_from_config()
# Create a new prompt and add versions
registry.create_prompt("movie-critic")
v1 = registry.add_version(
"movie-critic",
"As a movie critic, do you like {{movie}}?",
author="alice"
)
v2 = registry.add_version(
"movie-critic",
"As a {{criticlevel}} movie critic, do you like {{movie}}?",
author="bob"
)
v3 = registry.add_version(
"movie-critic",
"You are a {{criticlevel}} movie critic. Write a review for {{movie}}.",
author="alice"
)
# Set up environment labels
registry.set_label("movie-critic", "staging", 2)
registry.set_label("movie-critic", "production", 3)
Now your application code becomes environment-aware:
def get_movie_review(movie: str, critic_level: str = "professional"):
prompt = registry.get_prompt("movie-critic")
version = prompt.get_version_by_label(os.environ.get("ENV", "development"))
template = version.prompt_text
filled = template.replace("{{movie}}", movie)
filled = filled.replace("{{criticlevel}}", critic_level)
return call_llm(filled)
Rollback in Seconds
Something goes wrong in production? One line:
registry.set_label("movie-critic", "production", 2)
No code deployment. No PR review. Instant rollback to a known good state.
Data Engineering Patterns Applied
This system applies several patterns we use daily:
Schema Registry: Just as Kafka schema registries manage Avro/Protobuf schemas, the prompt registry manages prompt schemas. Both provide a single source of truth with versioning.
Immutable Infrastructure: Versions never change once created. This guarantees reproducibility and simplifies debugging.
Blue-Green Deployments: Labels enable instant switching between versions. Point production to version 3, test it, then point it back to version 2 if needed.
Configuration as Code: Prompts are configuration. They should be versioned, reviewed, and deployed with the same rigor as application code.
Extending the System
The base implementation uses local JSON storage. For production, extend it:
class PostgresRegistry(PromptRegistry):
"""Store prompts in PostgreSQL for durability and querying."""
def _load(self):
# Load from database
pass
def _save(self):
# Write to database with transaction
pass
You can also integrate with CI/CD:
# .github/workflows/prompt-deploy.yml
on:
push:
paths:
- 'prompts/**'
jobs:
deploy-prompts:
runs-on: ubuntu-latest
steps:
- name: Update prompt registry
run: python scripts/update_prompts.py
What’s Next
In the next post, we’ll build a complete RAG pipeline with Apache Airflow. We’ll use this prompt registry to manage the prompts for our retrieval and generation steps.
The code for this project is available at GitHub.
This is Part 2 of the “Data Engineering Meets AI” series. Read Part 1: RAG is Just ETL