LLMOps, AIOps & MLOps

The AI Operations Landscape

Just as DevOps revolutionized software delivery, AI Operations are critical for reliable AI systems.

The Terminology

MLOps (Machine Learning Operations)

The traditional discipline for training and deploying custom models.

Focus: Training pipelines, data versioning, model registry, inference serving.
Target: Data Scientists building models from scratch (e.g., Fraud detection classifier).

LLMOps (Large Language Model Operations)

A specialized subset of MLOps for Generative AI.

Focus: Prompt management, evaluation (evals), RAG retrieval quality, model chaining.
Target: AI Engineers building apps with GPT/Claude.

AIOps (Artificial Intelligence for IT Operations)

Using AI to improve Ops.

Focus: Anomaly detection, automated incident response, log analysis.
Target: SREs and DevOps engineers.

The LLMOps Lifecycle

graph LR
    Dev[Development] --> Eval[Evaluation]
    Eval --> Deploy[Deployment]
    Deploy --> Monitor[Monitoring]
    Monitor -->|Feedback| Dev
    
    subgraph Development
    A[Prompt Engineering]
    B[Playground Testing]
    end
    
    subgraph Evaluation
    C[Golden Datasets]
    D[Automated Evals]
    end
    
    subgraph Monitoring
    E[Cost / Latency]
    F[Quality / Drift]
    G[User Feedback]
    end

Key LLMOps Concepts

Prompt Versioning: Treating prompts as code. They should be version control systems (Git) or specialized registries.
Evals (Evaluations): Automated unit tests for AI.
- Input: “Summarize this email.”
- Check: Does the summary mention the deadline? (Boolean check).
Tracing: following the chain of execution. “Which step in the agent workflow failed?”

Tooling Landscape

Category	Tools	Purpose
Model Providers	Azure OpenAI, Bedrock, Fireworks	Hosting the LLMs.
Orchestration	LangChain, LangGraph, Semantic Kernel	Glue code for apps.
Vector DB	Quadrant, Weaviate, Pinecone	Knowledge storage.
LLMOps / Evals	Langfuse, Arize Phoenix, PromptLayer	Monitoring and testing.