DataHub, Iceberg, and Metadata Bridge Jobs on a Homelab Cluster

April 26, 2026· 1 min readRPi Kubernetes

How DataHub integration in rpi_kubernetes evolved with ingestion cronjobs, bridge configmaps, and Iceberg-oriented governance patterns.

RPi KubernetesDataHubIcebergMetadataData Engineering

Why Add DataHub Here

Object storage and pipelines are useful, but without metadata you eventually lose discoverability.

The rpi_kubernetes updates added a meaningful DataHub footprint under kubernetes/base-services/datahub/, including:

ingestion cronjobs
ingestion recipe configmaps
metadata bridge config
values files for prerequisites and DataHub itself

What Changed Operationally

The key progression was from static deployment to recurring ingestion behavior.

Files like:

cronjob-ingest-postgres.yaml
cronjob-ingest-minio-s3.yaml
cronjob-ingest-mlflow.yaml
cronjob-metadata-bridge.yaml

show that metadata refresh is now treated as an ongoing service, not a one-time setup task.

Why This Is A Good Pattern

On small clusters, operational simplicity matters. Cron-based ingestion gives predictable behavior and easy debugging while still enabling a governance layer across storage and ML systems.

It also aligns with the platform direction toward DataHub + Iceberg catalog patterns, making later lineage and discovery workflows easier.

What I Would Harden Next

For production-grade behavior, I would add:

explicit retry and dead-letter handling for ingestion jobs
richer run observability tied to Prometheus/Grafana
metadata freshness SLOs per source

The existing manifest structure makes these additions straightforward.

Practical Takeaway

If you already run MinIO, Postgres, and MLflow in-cluster, DataHub ingestion cronjobs are one of the highest leverage additions for long-term maintainability.

Management Control Panel: RBAC, API Surface, and Telemetry

May 6, 2026

How the management backend and frontend in rpi_kubernetes evolved into an operational control plane with explicit RBAC and observability wiring.

Grafana Port 3000 Conflict: Incident and Fix Pattern

May 5, 2026

A postmortem-style walkthrough of the Grafana port 3000 conflict in rpi_kubernetes and the scripted recovery path.

mDNS Cluster Discovery and Auto-Recovery in rpi_kubernetes

May 4, 2026

How mDNS, health monitoring, and k3s agent recovery scripts reduced cluster fragility in the Raspberry Pi environment.

CDC Sync on k3s: Watermarks, Deltas, and Replay Windows

May 3, 2026

How the CDC sync workflow in rpi_kubernetes handles incremental extraction and what to harden for reliable long-running operation.