Julian Wiley

DataHub, Airbyte, Polaris, and the small-cluster catalog

January 6, 2026· 1 min readRPi Kubernetes

How catalog, ingestion, and Iceberg-facing services fit into the homelab.

RPi KubernetesSystems DesignLocal FirstDevelopment Timeline

Why this mattered

A data platform is not only storage; it needs lineage, discovery, movement, and table conventions.

This belongs in the development timeline because RPi Kubernetes is not a single feature. It is a hybrid k3s homelab with an Ubuntu control plane, four Raspberry Pi 5 workers, Cloudflare Tunnel, and a data platform made from Kafka, Flink, Redis Stack, MinIO, DataHub, Airbyte, Polaris, and observability services. The project only became useful once its infrastructure decisions were written down well enough to be repeated.

Design decision

DataHub, Airbyte, and Polaris each occupy a different slot in that model, even when not every component runs all the time.

The practical stack around this decision includes k3s, Kustomize, Helm, Strimzi Kafka, Flink Operator, Redis Stack, RAGFlow, DataHub, Airbyte, Polaris, MinIO, Prometheus, Grafana, Loki, OpenTelemetry, Cloudflare Tunnel, FastAPI, Next.js. I try to keep the interfaces small: configuration describes intent, runtime code owns behavior, and operational notes explain what a future maintainer should check first.

What I would repeat

The practical lesson is to document intent and state separately: desired architecture, deployed services, and suspended jobs are not the same thing.

The repeatable pattern is to make the boring path explicit. For this project that means clear repository boundaries, documented setup, predictable deployment commands, and enough observability to know whether the system is healthy or merely quiet.

Reader takeaway

If you are building something similar, start with the workflow you need to repeat every week. Then add only the platform pieces that make that workflow easier to recover, explain, and extend.