MLOps with MLFlow and OpenTelemetry

February 21, 2026· 2 min readAgentic Assistants

Implementing experiment tracking, model registry, and distributed tracing for ML workloads using MLFlow and OpenTelemetry.

MLFlowOpenTelemetryMLOpsObservabilityMonitoring

Why MLOps Matters

Training a model is 10% of the work. The other 90% is tracking experiments, managing model versions, monitoring performance, and debugging failures in production. Agentic Assistants integrates MLFlow for experiment tracking and OpenTelemetry for distributed tracing.

MLFlow Integration

The MLFlow tracker (core/mlflow_tracker.py) wraps every training run, agent execution, and pipeline invocation with automatic logging:

import mlflow

with mlflow.start_run(run_name="mistral-7b-qlora-v3"):
    mlflow.log_params({
        "base_model": "mistralai/Mistral-7B-v0.3",
        "method": "qlora",
        "lora_r": 16,
        "learning_rate": 2e-4,
    })

    model = train(config)

    mlflow.log_metrics({
        "eval_loss": model.eval_loss,
        "eval_perplexity": model.eval_perplexity,
    })

    mlflow.log_artifact("./models/checkpoint-final")

The framework automatically logs:

Training runs -- Hyperparameters, loss curves, evaluation metrics, model artifacts
Agent executions -- Input/output pairs, tool calls, execution time, token usage
Pipeline runs -- Node execution times, dataset lineage, intermediate artifacts

Model Registry

MLFlow's model registry provides version control for trained models. Each model goes through a lifecycle: staging -> production -> archived. The framework's serving module reads directly from the registry, so promoting a model to production automatically makes it available for inference.

OpenTelemetry Tracing

For distributed workloads (multi-agent pipelines, Kubernetes-deployed services), you need more than metrics -- you need traces. The telemetry module (core/telemetry.py) instruments every major operation with OpenTelemetry spans:

Agent Execution (root span)
  ├── LLM Inference (Ollama)
  │   ├── Prompt Construction
  │   └── Model Completion
  ├── Tool: web_search
  │   ├── Query Planning
  │   └── Result Processing
  └── Memory: retrieve context
      ├── Vector Search
      └── Ranking

Traces are exported to Jaeger, where you can visualize the full execution path, identify bottlenecks, and debug failures.

Docker Compose Stack

The MLOps stack runs alongside the main application:

services:
  mlflow:
    image: ghcr.io/mlflow/mlflow:latest
    ports: ["5000:5000"]
    volumes:
      - mlflow-data:/mlflow
    environment:
      MLFLOW_BACKEND_STORE_URI: postgresql://...

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports: ["16686:16686", "4317:4317"]

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./docker/otel-collector-config.yaml:/etc/otelcol/config.yaml

Practical Benefits

With MLFlow and OpenTelemetry running, every experiment is reproducible (you can see the exact parameters used), every failure is diagnosable (traces show where things went wrong), and model promotion is safe (you can compare staging vs production metrics before switching).

The overhead is minimal -- OpenTelemetry adds single-digit milliseconds to each operation, and MLFlow logging is asynchronous.

Observability on Kubernetes: OpenTelemetry, Jaeger, and Loki

Feb 12, 2026

Building a full observability stack on a Raspberry Pi Kubernetes cluster with OpenTelemetry Collector, Jaeger, VictoriaMetrics, and Loki.

Deploying CyberSec Dashboard with Kubernetes and Observability

May 14, 2026

How cybersec_dashboard packages API and UI deployment with Kubernetes manifests, ServiceMonitor integration, and OTEL/Loki-ready telemetry.

Management Control Panel: RBAC, API Surface, and Telemetry

May 6, 2026

How the management backend and frontend in rpi_kubernetes evolved into an operational control plane with explicit RBAC and observability wiring.

Pipeline Recipe 3: Hybrid Dagster to Argo Heavy Transform

May 1, 2026