Satori Architecture

How Satori gives MCP coding agents symbol-owned repo search, exact reads, bounded graph context, and freshness checks without hiding state or failures.

MCP runtime Vector search Symbol registry Relationship sidecar Snapshot state

System Overview

Agent requests route through a fixed six-tool registry. The MCP runtime enforces freshness/fingerprint gates and orchestrates search. Core sync provides deterministic file-state truth for indexing.

Agent Claude / Codex / Cursor Tool Registry (6 MCP Tools) MCP Runtime ToolHandlers & Orchestration Freshness Gate Snapshot + Fingerprint Gates Vector Store Zilliz / Milvus Navigation Sidecars Symbols / Relationships Core Sync File Sync / Merkle / Hash

Agent

Claude Code, Codex, Cursor-style workflows, or another MCP client.

Tool Registry

Exposes six agent-safe operations instead of a wide internal API.

MCP Runtime

Coordinates indexing, search, sync, freshness, and lifecycle state.

Storage

Vector store, symbol registry, relationship sidecars, local snapshots, and Merkle state.

@zokizuan/satori-core

Owns indexing, sync, chunking, metadata, and retrieval primitives.

@zokizuan/satori-mcp

Owns the MCP server, six-tool contract, lifecycle state, and agent envelopes.

@zokizuan/satori-cli

Installs managed client config, copies the first-party skill, and exposes shell tool calls.

Runtime Boundaries

MCP Runtime Owns

  • Tool contracts and schema validation.
  • Runtime snapshot state and lifecycle status.
  • Index freshness, fingerprint gates, and recovery guidance.
  • Agent-safe responses, warnings, and refusal paths.
  • Coordination across vector store, navigation sidecars, and core engine.

Core Engine Owns

  • Stat-first sync, hash-on-change, and deterministic Merkle roots.
  • Path normalization and snapshot diff primitives.
  • Partial-scan safety for unreadable files and directories.
  • AST-aware chunking, symbol ownership, relationship sidecars, and ignore filtering.
  • Low-level indexing/search primitives behind MCP orchestration.

Installer Boundary

Public setup is installer-owned. Supported clients should not pay package-manager resolution cost during every MCP startup, and users should not have to copy cache paths into each harness by hand.

One public command

satori-cli install --client all resolves the MCP runtime once, writes managed client config, and copies the first-party workflow skill.

Stable resident startup

Client config launches the installer-owned Node launcher at ~/.satori/bin/satori-mcp.js. Resident startup does not call npx, npm, or a package manager.

Harness formats stay behind CLI

Codex, Claude, and OpenCode config differences live in the installer and its tests, not in user-maintained snippets.

Agent-Safe Tool Surface

Satori exposes six MCP tools. The point is not a large API; it is a small set of operations that return state, warnings, and next steps when context is unsafe. The MCP server does not expose source-code write tools.

Discovery

list_codebases

Lists tracked roots grouped by ready, indexing, failed, or requires-reindex state.

Refuses to flatten state into a single success/failure bit.

search_codebase

Finds relevant code from plain-English behavior queries, exact identifier lookup, symbol-owned grouping, freshness checks, recommended actions, structured warnings, and navigation hints.

Refuses stale or unsafe context when index gates require attention.

Inspection

file_outline

Maps symbols in a file and resolves exact labels or IDs through sidecar metadata.

Refuses to guess when exact symbol resolution is ambiguous.

call_graph

Traverses callers and callees from a search-provided symbolRef.

Refuses to fake graph data when the sidecar is absent or not ready.

read_file

Reads files, line ranges, or exact symbols with optional annotated outline status.

Refuses unlimited context dumps; large reads are bounded.

Lifecycle

manage_index

Creates, syncs, reindexes, checks status, clears, and handles force-reindex recovery paths.

Refuses destructive ambiguity: clear is explicit, backend timeouts preserve local state, and mixed live runtime owners block mutation.

Navigation Sidecars

Satori separates retrieval from navigation. The vector store finds candidate evidence; the symbol registry and relationship sidecar explain what that evidence belongs to.

Symbol Registry

Files remain the source of truth. The registry is a derived view for one compatible indexed snapshot, with owner symbol keys, exact symbol instances, file-owner fallbacks, and outline records.

Relationship Sidecar

The current sidecar stores conservative CALLS v0 edges plus TS/JS relative IMPORTS and EXPORTS v0 edges. Ambiguous same-name targets are skipped rather than guessed.

Search Contract

search_codebase returns exact registry hits or symbols in grouped mode with action guidance, capability confidence, structured warnings, and fallbacks. Matching chunks are evidence for those symbols, not the final unit the agent should edit from.

Graph Boundary

call_graph now traverses compatible relationship sidecars directly. JSON sidecars remain canonical; SQLite is an additive cache for validation or explicit experimental reads and can serve only after parity with JSON registry and relationship sidecars is proven.

State and Safety

Satori treats index state as part of the contract. Agents do not just receive search results; they also receive status, freshness, and recovery guidance when the index is unsafe or incomplete.

manage_index(create) success failure manage_index(sync) auto-sync fingerprint mismatch manage_index(reindex) manage_index(clear) from any state not_indexed indexing ready failed sync_completed requires_reindex

Fingerprint Fields

{ embeddingProvider, embeddingModel, embeddingDimension, vectorStoreProvider, schemaVersion }

Safety Result

Prevents silent reads from incompatible embedding models/dimensions or schema-mode mismatches.

Search And Sync Behavior

The behavior below is shown directly because it is the core trust model: search is freshness-aware, sync is incremental, and cloud lifecycle operations are verified before local state is discarded.

Plain-English query -> semantic search -> owner symbol -> exact reads -> freshness check -> agent guidance

Search Pipeline

Freshness gate -> operator parse -> dense/BM25 retrieval -> deterministic filters -> rerank policy -> owner-symbol grouping -> stable tie-breaks.

Hybrid Retrieval

Dense vector hits and BM25 sparse keyword hits merge through reciprocal rank fusion, so concepts and exact tokens both count.

Language Capabilities

TypeScript, JavaScript, and Python are the only production-ready call_graph languages. Go and Rust are symbol_only: compatible registries can power outlines, while graph traversal returns unsupported_language.

Ignore Rules

Uses .gitignore, .satoriignore, repo-local satori.toml, and configured patterns. Signature checks converge even without watcher events.

Index Profiles

satori.toml selects default, minimal, or all-text. Profiles control what enters the index; search scope controls what gets queried.

Incremental Sync

Stat-first scan, hash-on-change, deterministic Merkle root, and changed-file chunk updates. Navigation reuses changed-file symbol output, preserves unchanged registry state, and recomputes relationships globally without re-splitting unchanged files.

Fail-Closed Navigation

Missing, stale, incompatible, partial, or recovery-failed navigation does not publish fake graph truth. Public tools return precise reasons such as missing_symbol_registry, missing_relationship_sidecar, or partial_index_navigation_unavailable.

Provider Behavior

Cloud and local providers use different result budgets. Rerank is capability-driven, docs scope skips rerank, and failures degrade with warnings.

Subdirectory Roots

Search can start from a package path inside an indexed parent. Satori resolves the effective root while keeping navigation fallbacks executable.

No Passive Cloud Repair

Remote collection existence is not readiness proof. Missing or incompatible local readiness is recovered only through explicit create or reindex flows with matching completion proof.

Verified Delete

Clear and force-reindex mutate local snapshot/Merkle state only after remote deletion is verified or the collection is already absent.

Example Flow

This is the intended architecture flow, not a simulated chat transcript.

Agent asks for the relevant implementation

search_codebase returns ranked behavior matches or exact identifier hits with freshness state, recommendedNextAction, capability confidence, and navigation hints.

Agent opens the exact symbol

file_outline and read_file.open_symbol resolve the implementation span from symbolInstanceId without guessing.

Agent checks callers and callees

call_graph provides nearby dependency context from compatible relationship sidecars.

Agent handles unsafe state explicitly

Requires-reindex, missing registry or relationship sidecar, noisy results, and backend timeout states return recovery guidance.

Agent edits with verified context

Satori does not edit code. It gives the coding agent enough repo evidence to avoid blind changes.