Satori Architecture

How Satori gives MCP coding agents repo-aware search, exact symbol reads, call-graph context, and freshness checks without hiding state or failures.

MCP runtime Vector search Call-graph sidecar Snapshot state

System Overview

Agent requests route through a fixed six-tool registry. The MCP runtime enforces freshness/fingerprint gates and orchestrates search. Core sync provides deterministic file-state truth for indexing.

Agent Claude / Codex / Cursor Tool Registry (6 MCP Tools) MCP Runtime ToolHandlers & Orchestration Freshness Gate Snapshot + Fingerprint Gates Vector Store Zilliz / Milvus Call-Graph Sidecar AST graph artifacts Core Sync File Sync / Merkle / Hash

Agent

Claude Code, Codex, Cursor-style workflows, or another MCP client.

Tool Registry

Exposes six agent-safe operations instead of a wide internal API.

MCP Runtime

Coordinates indexing, search, sync, freshness, and lifecycle state.

Storage

Vector store, call-graph sidecar, local snapshots, and Merkle state.

@zokizuan/satori-core

Owns indexing, sync, chunking, metadata, and retrieval primitives.

@zokizuan/satori-mcp

Owns the MCP server, six-tool contract, lifecycle state, and agent envelopes.

@zokizuan/satori-cli

Installs managed client config, copies first-party skills, and exposes shell tool calls.

Runtime Boundaries

MCP Runtime Owns

  • Tool contracts and schema validation.
  • Runtime snapshot state and lifecycle status.
  • Index freshness, fingerprint gates, and recovery guidance.
  • Agent-safe responses, warnings, and refusal paths.
  • Coordination across vector store, sidecar, and core engine.

Core Engine Owns

  • Stat-first sync, hash-on-change, and deterministic Merkle roots.
  • Path normalization and snapshot diff primitives.
  • Partial-scan safety for unreadable files and directories.
  • AST-aware chunking, metadata, and ignore filtering.
  • Low-level indexing/search primitives behind MCP orchestration.

Agent-Safe Tool Surface

Satori exposes six MCP tools. The point is not a large API; it is a small set of operations that return state, warnings, and next steps when context is unsafe. The MCP server does not expose source-code write tools.

Discovery

list_codebases

Lists tracked roots grouped by ready, indexing, failed, or requires-reindex state.

Refuses to flatten state into a single success/failure bit.

search_codebase

Finds relevant code with scoped semantic search, grouping, freshness checks, and navigation hints.

Refuses stale or unsafe context when index gates require attention.

Inspection

file_outline

Maps symbols in a file and resolves exact labels or IDs through sidecar metadata.

Refuses to guess when exact symbol resolution is ambiguous.

call_graph

Traverses callers and callees from a search-provided symbolRef.

Refuses to fake graph data when the sidecar is absent or not ready.

read_file

Reads files, line ranges, or exact symbols with optional annotated outline status.

Refuses unlimited context dumps; large reads are bounded.

Lifecycle

manage_index

Creates, syncs, reindexes, checks status, clears, and handles force-reindex recovery paths.

Refuses destructive ambiguity: clear is explicit and backend timeouts preserve local state.

State and Safety

Satori treats index state as part of the contract. Agents do not just receive search results; they also receive status, freshness, and recovery guidance when the index is unsafe or incomplete.

manage_index(create) success failure manage_index(sync) auto-sync fingerprint mismatch manage_index(reindex) manage_index(clear) from any state not_indexed indexing ready failed sync_completed requires_reindex

Fingerprint Fields

{ embeddingProvider, embeddingModel, embeddingDimension, vectorStoreProvider, schemaVersion }

Safety Result

Prevents silent reads from incompatible embedding models/dimensions or schema-mode mismatches.

Search And Sync Behavior

The behavior below is shown directly because it is the core trust model: search is freshness-aware, sync is incremental, and cloud lifecycle operations are verified before local state is discarded.

Query -> search -> exact reads -> freshness check -> agent guidance

Search Pipeline

Freshness gate -> operator parse -> dense/BM25 retrieval -> deterministic filters -> rerank policy -> grouping -> stable tie-breaks.

Hybrid Retrieval

Dense vector hits and BM25 sparse keyword hits merge through reciprocal rank fusion, so concepts and exact tokens both count.

Language Capabilities

TypeScript, JavaScript, and Python have the strongest graph and outline support. Other languages use AST chunking or text fallback with explicit unsupported states.

Ignore Rules

Uses .gitignore, .satoriignore, and configured patterns. Signature checks converge even without watcher events.

Incremental Sync

Stat-first scan, hash-on-change, deterministic Merkle root, and vector updates only for added, removed, or modified files.

Provider Profiles

Cloud and local providers use different result budgets. Rerank is capability-driven, docs scope skips rerank, and failures degrade with warnings.

Subdirectory Roots

Search can start from a package path inside an indexed parent. Satori resolves the effective root while keeping navigation fallbacks executable.

Cloud Repair

Remote collections repair local ready state only with valid completion marker proof, matching codebase path, and matching runtime fingerprint.

Verified Delete

Clear and force-reindex mutate local snapshot/Merkle state only after remote deletion is verified or the collection is already absent.

Example Flow

This is the intended architecture flow, not a simulated chat transcript.

Agent asks for the relevant implementation

search_codebase returns ranked code groups with freshness state and navigation hints.

Agent opens the exact symbol

file_outline and read_file.open_symbol resolve the implementation span without guessing.

Agent checks callers and callees

call_graph provides nearby dependency context when the sidecar is ready.

Agent handles unsafe state explicitly

Requires-reindex, missing sidecar, noisy results, and backend timeout states return recovery guidance.

Agent edits with verified context

Satori does not edit code. It gives the coding agent enough repo evidence to avoid blind changes.