Skip to content

LLMOps, AIOps & MLOps

Just as DevOps revolutionized software delivery, AI Operations are critical for reliable AI systems.

The traditional discipline for training and deploying custom models.

  • Focus: Training pipelines, data versioning, model registry, inference serving.
  • Target: Data Scientists building models from scratch (e.g., Fraud detection classifier).

A specialized subset of MLOps for Generative AI.

  • Focus: Prompt management, evaluation (evals), RAG retrieval quality, model chaining.
  • Target: AI Engineers building apps with GPT/Claude.

AIOps (Artificial Intelligence for IT Operations)

Section titled “AIOps (Artificial Intelligence for IT Operations)”

Using AI to improve Ops.

  • Focus: Anomaly detection, automated incident response, log analysis.
  • Target: SREs and DevOps engineers.
graph LR
    Dev[Development] --> Eval[Evaluation]
    Eval --> Deploy[Deployment]
    Deploy --> Monitor[Monitoring]
    Monitor -->|Feedback| Dev
    
    subgraph Development
    A[Prompt Engineering]
    B[Playground Testing]
    end
    
    subgraph Evaluation
    C[Golden Datasets]
    D[Automated Evals]
    end
    
    subgraph Monitoring
    E[Cost / Latency]
    F[Quality / Drift]
    G[User Feedback]
    end
  1. Prompt Versioning: Treating prompts as code. They should be version control systems (Git) or specialized registries.
  2. Evals (Evaluations): Automated unit tests for AI.
    • Input: “Summarize this email.”
    • Check: Does the summary mention the deadline? (Boolean check).
  3. Tracing: following the chain of execution. “Which step in the agent workflow failed?”
CategoryToolsPurpose
Model ProvidersAzure OpenAI, Bedrock, FireworksHosting the LLMs.
OrchestrationLangChain, LangGraph, Semantic KernelGlue code for apps.
Vector DBQuadrant, Weaviate, PineconeKnowledge storage.
LLMOps / EvalsLangfuse, Arize Phoenix, PromptLayerMonitoring and testing.