Pacific Slate

A self-hosted, production-grade multi-agent AI platform — architected and operated by Ryan Thompson.

Pacific Slate is a personal AI system I designed and run on a dedicated server. It routes work across role-specialized models, calls a fleet of tool-servers over the Model Context Protocol (MCP), retrieves over a private knowledge corpus, and runs proactive routines on a schedule — all as containerized micro-services with cost governance, observability, and automated health-gated deploys.

It is not a demo or a course project. It is a live system I use daily and have operated under real constraints — cost ceilings, model outages, uptime, and security. I built it to prove I can take agentic AI from architecture to a system that runs every day under real constraints.

A note on what this is and isn’t. I’m an operator and systems architect, not a career software engineer. I designed the architecture, made the model-routing / reliability / cost decisions, and implemented the system AI-natively (with coding assistants like Claude Code). That is the point, not a caveat: this is what building looks like in 2026, and it’s how one person ships and runs a system this size. Everything below is something I can walk through and defend.

At a glance

58 containerized services orchestrated with Docker Compose on a dedicated server (Cloudflare Tunnel ingress, segmented Docker networks).
Role-specialized multi-LLM routing — 8 specialist agent roles, each pinned to the model best suited (and priced) for its job, with automatic fallback chains so a single provider outage doesn’t take the system down.
23 MCP tool-servers behind a single gateway — memory, knowledge/RAG, code execution, git, browser/screenshot, infrastructure, intel, and more — added by registering a server, not rewiring agents.
Event-driven orchestration — an orchestrator dispatches to 8 specialist agents (operator, coder, researcher, coder-researcher, analyst, productivity, reviewer, evaluator) over a Redis event bus.
Retrieval over a private corpus + a relationship-graph service for durable, queryable memory across sessions.
Cost governance as a first-class constraint — per-request spend tracking against a hard monthly budget.
Observability + safe deploys — full request tracing (Langfuse) and automated, health-gated deployments.
Next.js front end — a canvas UI where requests spawn live, draggable result cards.

(Service/tool counts are current as of this writing and reflect the running system.)

Architecture

Pacific Slate architecture

Diagram source (Mermaid)

```mermaid flowchart TD U[User · Next.js canvas UI] -->|AG-UI over SSE| API[Cognitive API · Starlette] API --> ORCH[Orchestrator] ORCH <-->|Redis event bus| SPEC[8 specialist agents
operator · coder · researcher · analyst · …] SPEC --> ROUTER[Resilient model router
role → model + fallback chain] ROUTER --> LLMs[(Multiple LLM providers)] SPEC -->|MCP| GW[MCP gateway] GW --> TOOLS[23 MCP tool-servers
memory · knowledge/RAG · exec · git · infra · intel · browser · …] SPEC --> RAG[(Private knowledge corpus · RAG)] SPEC --> GRAPH[(Relationship graph)] API --> COST[Cost tracker · per-request budget guardrail] API --> OBS[Observability · Langfuse tracing] subgraph Infra[Dedicated server · Docker · Cloudflare Tunnel] API ORCH SPEC ROUTER GW TOOLS RAG GRAPH end ```

The problem it solves

Frontier chat tools are stateless, sanitized at the output layer, and own your data. I wanted a sovereign, always-on AI that remembers context across sessions, reasons over my own knowledge, and is composed of specialized agents I can route work to — without renting someone else’s walled garden. The hard engineering problem underneath: orchestrate many agents and tools reliably and cheaply, under real operational constraints.

Key design decisions

Role-specialized model routing over one-model-for-everything. Each agent role is mapped to the model that’s best and most cost-effective for its task (a fast reasoner for the operator, a code model for the coder, a research model for retrieval-heavy work), each with a fallback chain. This cuts cost and raises quality versus paying frontier prices for every call — and it makes the system resilient to any single provider’s outage.
MCP as the integration layer. Tools are exposed to agents through Model Context Protocol servers behind a single gateway, rather than hard-wired into each agent. New capabilities are added by registering a tool-server, not by rewriting agents. This is what lets 23 tool-servers compose cleanly — the cost is an extra network hop and a single failure domain at the gateway, which gets its own health check.
Cost as a hard constraint, not an afterthought. Every request is metered before dispatch against a hard daily cap (~$7.50/day). The router defaults to a cheaper model tier and escalates to a role’s primary only when the task warrants it; at the cap it fails closed rather than overspend. Typical model spend runs ~$50–100/month, higher during active development. Designing under a real ceiling forces genuine trade-offs — model choice, context size, when to retrieve.
Reliability you can see. Ordered fallback chains for provider failure (rate-limits are the routine case; the chain ends on a self-hosted model so it degrades to slower-but-up), request tracing on every call, and deploys gated on health checks.
Durable memory. Retrieval over a private, indexed corpus (~27,000 documents / ~295,000 chunks) plus a relationship-graph service give the system memory that persists across sessions and is queryable — not a context window that resets.

What broke (and what I changed)

The failure mode that taught me the most was a silent one. Agents would intermittently get worse — more hedging, the occasional refusal — with nothing in the error logs. It wasn’t the prompts: under budget pressure the cost router was quietly swapping an agent’s model for a cheaper, more heavily-aligned one, which returned 200 OK while behaving differently. Silent degradation is worse than a hard failure, because you debug the wrong layer for an hour first. The fix was observability on the swap itself — every model substitution now emits a trace event, so the first question on any behavioral regression is “which model actually served this call?” before anyone touches the prompt.

Most of the reliability work has had that shape: real-time outages, silently-failing models, and harness/runtime incompatibilities between the CLI and the platform — surfaced by debugging, then closed with a guardrail or a trace so the same problem can’t hide again.

What it does

Routes a request to the right specialist agent, pulls relevant context from the corpus, calls external tools through MCP, synthesizes an answer, and persists durable memory. It also runs proactive routines on a schedule — not just on demand.

How it’s built (AI-native)

I drove the architecture and the decisions; the implementation is AI-assisted. I scoped each problem, chose the model-routing and reliability and cost patterns, debugged what didn’t work, and own the running system. The AI accelerated the typing; the judgment, integration, and operation are mine. That combination — an operator who can architect and run real agentic AI — is the whole point of the project.

Selected artifacts

examples/model-routing.example.yaml — sanitized excerpt of the real role → model routing, cost-router, and fallback config.
examples/model_fallback.py — the resilient-call fallback pattern (simplified), with the design decisions worth defending in the comments.

About

Built and operated by Ryan Thompson. LinkedIn: linkedin.com/in/thompson-rc

The live system is private by design (data sovereignty). This repository documents its architecture and decisions; it does not expose the running deployment, credentials, or any personal data.