Vector Sync Recipe: Dual-Writing to Milvus and ChromaDB

May 2, 2026· 1 min readRPi Kubernetes

How the vector sync pipeline in rpi_kubernetes coordinates chunking, embedding, and dual vector-store writes with audit logging.

RPi KubernetesMilvusChromaDBVector SearchArgo Workflows

Why Dual-Write Vectors

docs/data-pipeline-recipes.md includes a vector sync workflow that writes to both Milvus and ChromaDB.

That may look redundant, but it is a practical platform strategy:

ChromaDB for lightweight iteration
Milvus for higher-scale production-style behavior

Running both lets teams compare behavior without changing upstream ingestion logic.

Workflow Intent

The recipe performs:

load source payload from MinIO
chunk and embed text
write vectors into target collection
persist audit metadata to PostgreSQL (pipeline_vector_audit)

The audit write is especially important because it gives traceability when vector quality or retrieval relevance drifts.

Hardening Work Left

The docs correctly call out two critical TODOs:

replace fallback embeddings with production model endpoint
pin embedding model versions and re-embedding strategy

Those two controls determine long-term retrieval consistency.

Operational Value

This recipe makes vector ingestion measurable. Instead of hoping embeddings are fresh, operators can query audit tables and verify pipeline activity.

That closes a common observability gap in RAG systems.

Practical Takeaway

If your retrieval stack spans dev and prod vector stores, build one sync pipeline with explicit audit writes. You will need that history when quality investigations start.

CDC Sync on k3s: Watermarks, Deltas, and Replay Windows

May 3, 2026

How the CDC sync workflow in rpi_kubernetes handles incremental extraction and what to harden for reliable long-running operation.

Pipeline Recipe 3: Hybrid Dagster to Argo Heavy Transform

May 1, 2026

How rpi_kubernetes uses Dagster for control and lineage while delegating heavyweight transforms to Argo WorkflowTemplates.

Pipeline Recipe 1: Raw Ingest to MinIO with Argo

Apr 30, 2026

How the raw ingest workflow template in rpi_kubernetes moves source data into immutable MinIO paths for downstream processing.

Argo Workflows vs Argo Events: CRD Discovery Lessons

Apr 29, 2026

A practical debugging guide from rpi_kubernetes on why Argo Workflows can surface missing Argo Events CRDs and how to fix it cleanly.

← Previous

The portal content system became part of the product