Julian Wiley

Drive discovery to Iceberg

April 7, 2026· 1 min readAgentic Assistants

How local file discovery becomes cataloged data instead of a one-off scan.

Agentic AssistantsSystems DesignLocal FirstDevelopment Timeline

Why this mattered

Unstructured drives are full of datasets, notebooks, exports, and project artifacts.

This belongs in the development timeline because Agentic Assistants is not a single feature. It is a local-first assistant framework with a CLI, FastAPI and WebSocket server, MCP bridge, Next.js control panel, indexing, scoped retrieval, knowledge bases, pipelines, discovery, and training workflows. The project only became useful once its infrastructure decisions were written down well enough to be repeated.

Design decision

Discovery turns that mess into manifests, format detection, provenance, and eventually Iceberg-backed tables.

The practical stack around this decision includes Python, Poetry, Click, FastAPI, WebSockets, MCP, LanceDB, Chroma, DuckDB, Polars, PyArrow, CrewAI, LangChain, LangGraph, Ollama, MLflow, OpenTelemetry, Next.js, Docusaurus. I try to keep the interfaces small: configuration describes intent, runtime code owns behavior, and operational notes explain what a future maintainer should check first.

What I would repeat

The value is not only loading data; it is remembering what was found and how it was classified.

The repeatable pattern is to make the boring path explicit. For this project that means clear repository boundaries, documented setup, predictable deployment commands, and enough observability to know whether the system is healthy or merely quiet.

Reader takeaway

If you are building something similar, start with the workflow you need to repeat every week. Then add only the platform pieces that make that workflow easier to recover, explain, and extend.