Satori | Architecture

System Overview

Agent requests route through a fixed six-tool registry. The MCP runtime enforces freshness/fingerprint gates and orchestrates search. Core sync provides deterministic file-state truth for indexing.

Agent

Claude Code, Codex, Cursor-style workflows, or another MCP client.

Tool Registry

Exposes six agent-safe operations instead of a wide internal API.

MCP Runtime

Coordinates indexing, search, sync, freshness, and lifecycle state.

Storage

Vector store, call-graph sidecar, local snapshots, and Merkle state.

`@zokizuan/satori-core`

Owns indexing, sync, chunking, metadata, and retrieval primitives.

`@zokizuan/satori-mcp`

Owns the MCP server, six-tool contract, lifecycle state, and agent envelopes.

`@zokizuan/satori-cli`

Installs managed client config, copies first-party skills, and exposes shell tool calls.

Runtime Boundaries

MCP Runtime Owns

Tool contracts and schema validation.
Runtime snapshot state and lifecycle status.
Index freshness, fingerprint gates, and recovery guidance.
Agent-safe responses, warnings, and refusal paths.
Coordination across vector store, sidecar, and core engine.

Core Engine Owns

Stat-first sync, hash-on-change, and deterministic Merkle roots.
Path normalization and snapshot diff primitives.
Partial-scan safety for unreadable files and directories.
AST-aware chunking, metadata, and ignore filtering.
Low-level indexing/search primitives behind MCP orchestration.

Agent-Safe Tool Surface

Satori exposes six MCP tools. The point is not a large API; it is a small set of operations that return state, warnings, and next steps when context is unsafe. The MCP server does not expose source-code write tools.

Discovery

`list_codebases`

Lists tracked roots grouped by ready, indexing, failed, or requires-reindex state.

Refuses to flatten state into a single success/failure bit.

`search_codebase`

Finds relevant code with scoped semantic search, grouping, freshness checks, and navigation hints.

Refuses stale or unsafe context when index gates require attention.

Inspection

`file_outline`

Maps symbols in a file and resolves exact labels or IDs through sidecar metadata.

Refuses to guess when exact symbol resolution is ambiguous.

`call_graph`

Traverses callers and callees from a search-provided symbolRef.

Refuses to fake graph data when the sidecar is absent or not ready.

`read_file`

Reads files, line ranges, or exact symbols with optional annotated outline status.

Refuses unlimited context dumps; large reads are bounded.

Lifecycle

`manage_index`

Creates, syncs, reindexes, checks status, clears, and handles force-reindex recovery paths.

Refuses destructive ambiguity: clear is explicit and backend timeouts preserve local state.

State and Safety

Satori treats index state as part of the contract. Agents do not just receive search results; they also receive status, freshness, and recovery guidance when the index is unsafe or incomplete.

Fingerprint Fields

{ embeddingProvider, embeddingModel, embeddingDimension, vectorStoreProvider, schemaVersion }

Safety Result

Prevents silent reads from incompatible embedding models/dimensions or schema-mode mismatches.

Search And Sync Behavior

The behavior below is shown directly because it is the core trust model: search is freshness-aware, sync is incremental, and cloud lifecycle operations are verified before local state is discarded.

Query -> search -> exact reads -> freshness check -> agent guidance

Search Pipeline

Freshness gate -> operator parse -> dense/BM25 retrieval -> deterministic filters -> rerank policy -> grouping -> stable tie-breaks.

Hybrid Retrieval

Dense vector hits and BM25 sparse keyword hits merge through reciprocal rank fusion, so concepts and exact tokens both count.

Language Capabilities

TypeScript, JavaScript, and Python have the strongest graph and outline support. Other languages use AST chunking or text fallback with explicit unsupported states.

Ignore Rules

Uses .gitignore, .satoriignore, and configured patterns. Signature checks converge even without watcher events.

Incremental Sync

Stat-first scan, hash-on-change, deterministic Merkle root, and vector updates only for added, removed, or modified files.

Provider Profiles

Cloud and local providers use different result budgets. Rerank is capability-driven, docs scope skips rerank, and failures degrade with warnings.

Subdirectory Roots

Search can start from a package path inside an indexed parent. Satori resolves the effective root while keeping navigation fallbacks executable.

Cloud Repair

Remote collections repair local ready state only with valid completion marker proof, matching codebase path, and matching runtime fingerprint.

Verified Delete

Clear and force-reindex mutate local snapshot/Merkle state only after remote deletion is verified or the collection is already absent.

Example Flow

This is the intended architecture flow, not a simulated chat transcript.

Agent asks for the relevant implementation

search_codebase returns ranked code groups with freshness state and navigation hints.

Agent opens the exact symbol

file_outline and read_file.open_symbol resolve the implementation span without guessing.

Agent checks callers and callees

call_graph provides nearby dependency context when the sidecar is ready.

Agent handles unsafe state explicitly

Requires-reindex, missing sidecar, noisy results, and backend timeout states return recovery guidance.

Agent edits with verified context

Satori does not edit code. It gives the coding agent enough repo evidence to avoid blind changes.

Satori Architecture