Architecture

The big picture

NEXUS is a configuration and routing layer, not another AI CLI. It sits beneath whatever tools you already use and makes them smarter.

User Request
    │
    ▼
AI CLI Tool  ←──── reads NEXUS personas and routing rules via symlinks
(Claude Code / Gemini CLI / Kiro)
    │
    ▼
NEXUS Orchestrator (core/NEXUS.md)
    │
    ├──── Deep Work ────────► Cloud Models (Claude / Gemini)
    │
    └──── Micro-Tasks ──────► nexus-ollama MCP Server
                                    │
                                    └──► Local Ollama Models

Symlink architecture

The core insight: AI CLIs read their instructions from well-known file paths. NEXUS creates symlinks from those paths into its own config directory.

~/.claude/CLAUDE.md          → ~/.config/nexus/core/NEXUS.md
~/.gemini/GEMINI.md          → ~/.config/nexus/core/NEXUS.md
~/.kiro/steering/nexus-*.md  → ~/.config/nexus/core/NEXUS.md
~/.claude/agents/            → ~/.config/nexus/personas/

Why this matters: Update one file in ~/.config/nexus/ and every AI CLI you use picks up the change immediately. No manual sync. No per-tool configuration.

The Expert Orchestrator

core/NEXUS.md is the central brain. It’s loaded by every AI CLI and tells the AI:

Scan the persona registry (~/.config/nexus/personas/) before doing any specialized work
Delegate to the right specialist rather than doing the work directly
Route micro-tasks to the local compute plane when possible
Manage context — compact at 50%, stop spawning at 75%

If no relevant persona exists for a task, the orchestrator asks the user to:

Create a new persona
Promote one from the archive
Proceed without a specialist

Task routing

NEXUS routes tasks based on complexity:

Task type	Destination	Examples
Structured generation	Local (supervisor band, 1.5B)	Commit messages, boilerplate, test scaffolds
Code transformation	Local (logic band, 3B)	Lint fixes, refactors
Deep reasoning	Cloud (Claude/Gemini)	Architecture decisions, novel debugging
7B+ inference	Local only if >12GB VRAM	Full system architecture generation

The routing decision lives in core/NEXUS.md. As v0.3.0 ships, this becomes dynamic and latency-aware.

The local compute plane

The nexus-ollama MCP server is a Node.js process that exposes six tools via the Model Context Protocol. When an AI CLI has the MCP server configured, it calls these tools automatically instead of using cloud inference.

The server respects two environment variables:

NEXUS_SUPERVISOR_MODEL="qwen2.5-coder:1.5b"  # for structured generation
NEXUS_LOGIC_MODEL="llama3.2:3b"               # for code reasoning

If the Ollama server is unreachable, the MCP tool returns CIRCUIT_BREAKER and the AI CLI falls back to handling the task directly in the cloud.

Project structure

core/           Core orchestrator instructions (NEXUS.md, CLAUDE.md)
personas/       Agent persona definitions (.md files)
tools/tui/      NEXUS TUI binary (Go / Bubbletea v2)
tools/mcp/      Ollama MCP server (Node.js / Hono)
prompts/        Engineering rules and quality gate prompts
mcp-configs/    MCP configuration templates for each CLI
docs/           Documentation and hardware-specific presets
tests/          Integration tests (install/uninstall cycle)

Agent memory

NEXUS supports per-project agent memory. At the start of any project-scoped task, the orchestrator checks:

~/.config/nexus/agent-memory/<project-name>/

If memory files exist, they’re read before any analysis. This lets you persist decisions, preferences, and blockers across sessions without putting them in the repo.