Julian Wiley

MLOps with MLFlow and OpenTelemetry

February 21, 2026· 2 min readAgentic Assistants

Implementing experiment tracking, model registry, and distributed tracing for ML workloads using MLFlow and OpenTelemetry.

MLFlowOpenTelemetryMLOpsObservabilityMonitoring

Why MLOps Matters

Training a model is 10% of the work. The other 90% is tracking experiments, managing model versions, monitoring performance, and debugging failures in production. Agentic Assistants integrates MLFlow for experiment tracking and OpenTelemetry for distributed tracing.

MLFlow Integration

The MLFlow tracker (core/mlflow_tracker.py) wraps every training run, agent execution, and pipeline invocation with automatic logging:

import mlflow

with mlflow.start_run(run_name="mistral-7b-qlora-v3"):
    mlflow.log_params({
        "base_model": "mistralai/Mistral-7B-v0.3",
        "method": "qlora",
        "lora_r": 16,
        "learning_rate": 2e-4,
    })

    model = train(config)

    mlflow.log_metrics({
        "eval_loss": model.eval_loss,
        "eval_perplexity": model.eval_perplexity,
    })

    mlflow.log_artifact("./models/checkpoint-final")

The framework automatically logs:

  • Training runs -- Hyperparameters, loss curves, evaluation metrics, model artifacts
  • Agent executions -- Input/output pairs, tool calls, execution time, token usage
  • Pipeline runs -- Node execution times, dataset lineage, intermediate artifacts

Model Registry

MLFlow's model registry provides version control for trained models. Each model goes through a lifecycle: staging -> production -> archived. The framework's serving module reads directly from the registry, so promoting a model to production automatically makes it available for inference.

OpenTelemetry Tracing

For distributed workloads (multi-agent pipelines, Kubernetes-deployed services), you need more than metrics -- you need traces. The telemetry module (core/telemetry.py) instruments every major operation with OpenTelemetry spans:

Agent Execution (root span)
  ├── LLM Inference (Ollama)
  │   ├── Prompt Construction
  │   └── Model Completion
  ├── Tool: web_search
  │   ├── Query Planning
  │   └── Result Processing
  └── Memory: retrieve context
      ├── Vector Search
      └── Ranking

Traces are exported to Jaeger, where you can visualize the full execution path, identify bottlenecks, and debug failures.

Docker Compose Stack

The MLOps stack runs alongside the main application:

services:
  mlflow:
    image: ghcr.io/mlflow/mlflow:latest
    ports: ["5000:5000"]
    volumes:
      - mlflow-data:/mlflow
    environment:
      MLFLOW_BACKEND_STORE_URI: postgresql://...

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports: ["16686:16686", "4317:4317"]

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./docker/otel-collector-config.yaml:/etc/otelcol/config.yaml

Practical Benefits

With MLFlow and OpenTelemetry running, every experiment is reproducible (you can see the exact parameters used), every failure is diagnosable (traces show where things went wrong), and model promotion is safe (you can compare staging vs production metrics before switching).

The overhead is minimal -- OpenTelemetry adds single-digit milliseconds to each operation, and MLFlow logging is asynchronous.

Related Posts

Observability on Kubernetes: OpenTelemetry, Jaeger, and Loki
Feb 12, 2026
Building a full observability stack on a Raspberry Pi Kubernetes cluster with OpenTelemetry Collector, Jaeger, VictoriaMetrics, and Loki.
Deploying CyberSec Dashboard with Kubernetes and Observability
May 14, 2026
How cybersec_dashboard packages API and UI deployment with Kubernetes manifests, ServiceMonitor integration, and OTEL/Loki-ready telemetry.
Management Control Panel: RBAC, API Surface, and Telemetry
May 6, 2026
How the management backend and frontend in rpi_kubernetes evolved into an operational control plane with explicit RBAC and observability wiring.
Pipeline Recipe 3: Hybrid Dagster to Argo Heavy Transform
May 1, 2026
How rpi_kubernetes uses Dagster for control and lineage while delegating heavyweight transforms to Argo WorkflowTemplates.