Architecture
The big picture
NEXUS is a configuration and routing layer, not another AI CLI. It sits beneath whatever tools you already use and makes them smarter.
User Request
│
▼
AI CLI Tool ←──── reads NEXUS personas and routing rules via symlinks
(Claude Code / Gemini CLI / Kiro)
│
▼
NEXUS Orchestrator (core/NEXUS.md)
│
├──── Deep Work ────────► Cloud Models (Claude / Gemini)
│
└──── Micro-Tasks ──────► nexus-ollama MCP Server
│
└──► Local Ollama Models
Symlink architecture
The core insight: AI CLIs read their instructions from well-known file paths. NEXUS creates symlinks from those paths into its own config directory.
~/.claude/CLAUDE.md → ~/.config/nexus/core/NEXUS.md
~/.gemini/GEMINI.md → ~/.config/nexus/core/NEXUS.md
~/.kiro/steering/nexus-*.md → ~/.config/nexus/core/NEXUS.md
~/.claude/agents/ → ~/.config/nexus/personas/
Why this matters: Update one file in ~/.config/nexus/ and every AI CLI you use picks up the change immediately. No manual sync. No per-tool configuration.
The Expert Orchestrator
core/NEXUS.md is the central brain. It’s loaded by every AI CLI and tells the AI:
- Scan the persona registry (
~/.config/nexus/personas/) before doing any specialized work - Delegate to the right specialist rather than doing the work directly
- Route micro-tasks to the local compute plane when possible
- Manage context — compact at 50%, stop spawning at 75%
If no relevant persona exists for a task, the orchestrator asks the user to:
- Create a new persona
- Promote one from the archive
- Proceed without a specialist
Task routing
NEXUS routes tasks based on complexity:
| Task type | Destination | Examples |
|---|---|---|
| Structured generation | Local (supervisor band, 1.5B) | Commit messages, boilerplate, test scaffolds |
| Code transformation | Local (logic band, 3B) | Lint fixes, refactors |
| Deep reasoning | Cloud (Claude/Gemini) | Architecture decisions, novel debugging |
| 7B+ inference | Local only if >12GB VRAM | Full system architecture generation |
The routing decision lives in core/NEXUS.md. As v0.3.0 ships, this becomes dynamic and latency-aware.
The local compute plane
The nexus-ollama MCP server is a Node.js process that exposes six tools via the Model Context Protocol. When an AI CLI has the MCP server configured, it calls these tools automatically instead of using cloud inference.
The server respects two environment variables:
NEXUS_SUPERVISOR_MODEL="qwen2.5-coder:1.5b" # for structured generation
NEXUS_LOGIC_MODEL="llama3.2:3b" # for code reasoning
If the Ollama server is unreachable, the MCP tool returns CIRCUIT_BREAKER and the AI CLI falls back to handling the task directly in the cloud.
Project structure
core/ Core orchestrator instructions (NEXUS.md, CLAUDE.md)
personas/ Agent persona definitions (.md files)
tools/tui/ NEXUS TUI binary (Go / Bubbletea v2)
tools/mcp/ Ollama MCP server (Node.js / Hono)
prompts/ Engineering rules and quality gate prompts
mcp-configs/ MCP configuration templates for each CLI
docs/ Documentation and hardware-specific presets
tests/ Integration tests (install/uninstall cycle)
Agent memory
NEXUS supports per-project agent memory. At the start of any project-scoped task, the orchestrator checks:
~/.config/nexus/agent-memory/<project-name>/
If memory files exist, they’re read before any analysis. This lets you persist decisions, preferences, and blockers across sessions without putting them in the repo.