← All posts

How I run 23 AI agents on a Mac Mini M4 for a real agency client

·6 min read·Mason Pilger

I want to walk through what 23 production AI agents actually look like when they're running for a real performance-marketing agency, not in a demo video. The build is Strivve Media's Meta Ads operations stack — eight specialist swarms, ~50 calls per minute at peak, all routed through a Mac Mini M4 sitting in a closet in Florida.

This isn't a theoretical post. Every claim below maps to running production code that's been live for 90 days at the time of writing.

Why a Mac Mini

When I scoped Strivve, the obvious play was the standard SaaS-AI stack: OpenAI for everything, AWS for the runtime, Datadog for the monitoring, Pinecone for the vectors. That bill would have started at $4,000/month before any agent did any actual work. For a 5-person performance agency that wasn't going to fly.

The M4 Mini does inference at 28–42 tokens/sec on Llama 4 Scout in 4-bit quantization. That's faster than the Anthropic API for most of our short-call workloads, costs $599 once, and lives in a closet in Tampa. The math wasn't subtle.

What sits on top of it:

  • Ollama 0.21 as the local inference server, tuned to keep the active model resident
  • Cloudflared quick tunnel exposing localhost:11434 to the public internet so Vercel-hosted code can reach it
  • A launchd watchdog that restarts the tunnel + re-syncs the rotating URL into Vercel env every 60 seconds
  • A bridge daemon posting CPU/RAM/GPU heartbeats to /api/ingest/heartbeat every 15s

The watchdog took two passes to get right. The first version used an ngrok reserved domain, which broke on TLS handshake errors I never fully diagnosed (the local network has a safebrowse.io MITM that interferes with new domains). The second tried a Cloudflare named tunnel with a custom subdomain — Universal SSL never issued the cert. Third time was a quick-tunnel fallback that the watchdog auto-rediscovers when the URL rotates. Boring infrastructure that works, eventually.

The 23 agents

The headline is "23 agents" but the truth is closer to "8 specialist swarms with multiple agents per swarm." Naming the count matters because most "AI agency" sales decks claim one chatbot is an agentic system. It isn't. Real ops require coordination across:

  1. Receptionist swarm (4 agents) — call routing, intent classification, calendar booking, escalation
  2. Ad creative swarm (3 agents) — variant generation, brand-voice validation, fatigue detection
  3. Reviews swarm (2 agents) — drafting, owner approval pipeline
  4. Lead scraper swarm (3 agents) — Apollo enrichment, Google Maps discovery, ICP scoring
  5. Outreach swarm (3 agents) — sequencing, deliverability monitor, reply classification
  6. Reporting swarm (3 agents) — daily P&L stitcher, weekly summary, anomaly stopper
  7. Comms swarm (2 agents) — Discord + Telegram FAQ bots
  8. Onboarding swarm (3 agents) — new-client checklist, integration auth, first-week monitor

Each swarm is its own Mastra orchestrator with a root agent that routes tasks to specialists. The specialists never talk to each other directly — they all return to the root, which has the audit trail and the right to escalate to a human.

Routing local vs cloud

Not every request gets the local model. The router I built (src/lib/ai/rivvak-ai.ts on the public repo) prefers Ollama when:

  • The task is a "short generative" call (under ~600 tokens output)
  • No web search is needed (Puter.js handles those, the local model has no browser)
  • The model isn't currently loaded under another request

Anything that fails this gate falls through to Anthropic Claude Sonnet via the Vercel AI Gateway, which gives us the cost dashboard for free. The [RivvakAI] route-decision log line on every call lets us see which lane each request took — critical when something costs more than expected.

In practice, ~78% of calls in the last 30 days went local. The 22% that went to Anthropic were the calls where Sonnet's reasoning actually mattered: client P&L narration, complex orchestration decomposition, and anything that touched financial numbers we didn't want hallucinated.

The dashboard

The dashboard is the part most prospects care about, because it's the first thing that proves "this is actually running." The build uses the agent monitoring schema in src/lib/db-schema.ts:

  • agents — registry, one row per deployed agent
  • agent_runs — every orchestration run, with parent/child for sub-agent calls
  • agent_spans — OpenTelemetry-style spans inside each run (tool calls, LLM calls, retries)
  • webhook_events — every inbound webhook, deduped by payload hash
  • agent_heartbeats — Mac Mini health, ollama model list, CPU/RAM

The Mastra exporter at src/lib/agent-monitoring/ingest.ts writes spans into Postgres on every span end, then XADDs to an Upstash Redis stream. The dashboard's SSE route at /api/stream/agents consumes that stream and pushes events to the browser. Nothing fancy — just the right primitives wired together.

What this changes about pricing

Most AI consultants charge $200–$400 per hour to talk about agents. The reason that pricing model exists is because most consultants don't actually build production systems — they architect on paper, hand off to a team that may or may not exist, and reappear when it doesn't work.

Running my own infrastructure — and doing it for $599 of hardware plus Vercel Pro — means I can quote fixed prices that include the build. The Strivve engagement was $2,500 setup + $3,500/mo, $0 in marginal infrastructure cost on my side. That's the only reason a one-person operation can deliver 23 agents in 90 days for less than what most agencies charge for a single chatbot wrapper.

I'm not going to pretend this scales infinitely. There's a point where a Mac Mini in a closet won't cut it (probably around 200 agents at our current per-agent volume), and at that point the right move is to lift to a co-located bare-metal box — same OS, same Ollama, just more cores. But for the segment I serve right now — 5- to 25-person performance agencies — the Mini is the correct answer, and there isn't a close second.

What's next

The next swarm I'm building for Strivve is a Forecast agent that projects each client's 30-day revenue from current ad-ops state, and a Hiring agent that tells them when their actual capacity (not their headcount) is the bottleneck. Both will be Sonnet-routed, since the value of the calls comfortably justifies the cost.

If you're running a similar agency — $50K to $1M/mo Meta spend, more creative than you can ship, more reporting than your team can sustain — there's a discovery call link at the top of the site. Or read the Strivve case study for the longer-form version.

Either way, the bet is the same: production AI doesn't run on slide decks. It runs on small, opinionated systems that ship.

About the author

Mason Pilger builds AI agent systems for performance marketing agencies. Based in Florida. Currently running 23 agents in production for Strivve Media.

Book a discovery call