Vector Sync Recipe: Dual-Writing to Milvus and ChromaDB
How the vector sync pipeline in rpi_kubernetes coordinates chunking, embedding, and dual vector-store writes with audit logging.
Why Dual-Write Vectors
docs/data-pipeline-recipes.md includes a vector sync workflow that writes to both Milvus and ChromaDB.
That may look redundant, but it is a practical platform strategy:
- ChromaDB for lightweight iteration
- Milvus for higher-scale production-style behavior
Running both lets teams compare behavior without changing upstream ingestion logic.
Workflow Intent
The recipe performs:
- load source payload from MinIO
- chunk and embed text
- write vectors into target collection
- persist audit metadata to PostgreSQL (
pipeline_vector_audit)
The audit write is especially important because it gives traceability when vector quality or retrieval relevance drifts.
Hardening Work Left
The docs correctly call out two critical TODOs:
- replace fallback embeddings with production model endpoint
- pin embedding model versions and re-embedding strategy
Those two controls determine long-term retrieval consistency.
Operational Value
This recipe makes vector ingestion measurable. Instead of hoping embeddings are fresh, operators can query audit tables and verify pipeline activity.
That closes a common observability gap in RAG systems.
Practical Takeaway
If your retrieval stack spans dev and prod vector stores, build one sync pipeline with explicit audit writes. You will need that history when quality investigations start.