Pipeline Recipe 1: Raw Ingest to MinIO with Argo

April 30, 2026· 1 min readRPi Kubernetes

How the raw ingest workflow template in rpi_kubernetes moves source data into immutable MinIO paths for downstream processing.

RPi KubernetesArgo WorkflowsMinIOData PipelineIngestion

Why Start With Raw Ingest

The most reliable data platform pattern is still:

capture raw first, transform later.

docs/data-pipeline-recipes.md operationalizes that via the pipeline-raw-ingest Argo template, which supports multiple source types and writes immutable objects to MinIO.

What The Recipe Does

The workflow captures source payloads from HTTP/REST/S3/filesystem and stores them under controlled prefixes in a target bucket such as dagster-artifacts.

Example invocation from the docs:

argo submit --from workflowtemplate/pipeline-raw-ingest -n mlops \
  -p source_type=http \
  -p source_uri=https://example.com/data.json \
  -p output_prefix=raw/manual

Why This Pattern Scales

Raw persistence gives you:

replay capability
auditability
easier bug triage
clean separation between ingestion and transformation concerns

On constrained clusters, that separation is even more important because retries and partial failures are common.

Hardening Priorities

The docs already call out the right TODOs:

pagination by source type
dead-letter paths for failed pulls
retention policies for raw zones

I would add source checksum logging as well for dedup and forensic traceability.

Practical Takeaway

If your pipelines keep failing in unpredictable ways, start by making raw ingest deterministic and observable. Everything else becomes easier once the landing zone is stable.

CDC Sync on k3s: Watermarks, Deltas, and Replay Windows

May 3, 2026

How the CDC sync workflow in rpi_kubernetes handles incremental extraction and what to harden for reliable long-running operation.

Pipeline Recipe 3: Hybrid Dagster to Argo Heavy Transform

May 1, 2026

How rpi_kubernetes uses Dagster for control and lineage while delegating heavyweight transforms to Argo WorkflowTemplates.

Vector Sync Recipe: Dual-Writing to Milvus and ChromaDB

May 2, 2026

How the vector sync pipeline in rpi_kubernetes coordinates chunking, embedding, and dual vector-store writes with audit logging.

Argo Workflows vs Argo Events: CRD Discovery Lessons

Apr 29, 2026

A practical debugging guide from rpi_kubernetes on why Argo Workflows can surface missing Argo Events CRDs and how to fix it cleanly.