Julian Wiley

CDC Sync on k3s: Watermarks, Deltas, and Replay Windows

May 3, 2026· 1 min readRPi Kubernetes

How the CDC sync workflow in rpi_kubernetes handles incremental extraction and what to harden for reliable long-running operation.

RPi KubernetesCDCData PipelinePostgreSQLArgo Workflows

Why CDC Matters Here

Batch reloads are simple, but they do not scale operationally once source tables change frequently.

The pipeline-cdc-sync recipe in docs/data-pipeline-recipes.md introduces incremental behavior using primary key and updated-at watermarks.

Recipe Flow

The documented run path:

argo submit --from workflowtemplate/pipeline-cdc-sync -n mlops \
  -p source_table=source_events \
  -p primary_key=id \
  -p updated_at_column=updated_at

State is tracked in pipeline_cdc_state, and changed data is persisted to MinIO plus sink targets.

Why Watermarks Are Tricky

CDC pipelines fail in subtle ways:

  • late-arriving records
  • clock skew
  • update collisions

That is why replay windows and conflict policy need explicit ownership. The recipe notes this as hardening work, which is exactly right.

Operational Guidance

I recommend adding:

  1. overlap windows on extraction
  2. idempotent sink writes
  3. explicit upsert vs append rules by table
  4. alerting when watermark staleness exceeds threshold

Without these, "incremental" eventually turns into data drift.

Practical Takeaway

CDC is less about extraction code and more about state correctness over time. Keep state visible, test replay paths, and treat watermarks like production control data.

Related Posts