Mnemosyne: Semantic Memory and Multi-Agent Orchestration for LLM Systems

Abstract

Large language models face fundamental limitations: context windows bound working memory, coordination between agents lacks persistence, and knowledge evaporates between sessions. Mnemosyne addresses these challenges through a production-ready semantic memory system with multi-agent orchestration.

Built in Rust with LibSQL storage, it provides sub-millisecond retrieval (0.88ms list operations, 1.61ms search), LLM-guided memory evolution, and a four-agent coordination framework composed of Orchestrator, Optimizer, Reviewer, and Executor agents. The system integrates with Claude Code via Model Context Protocol, automatic hooks, and real-time monitoring.

Hybrid search combines keyword matching (FTS5), graph traversal, and vector similarity with weighted scoring. Privacy-preserving evaluation, comprehensive testing, and production deployment enable persistent context across sessions, autonomous agent coordination, and continuous memory optimization.

This paper presents the architecture, validates claims against tagged source code (v2.2.0), compares with existing solutions (MemGPT, Mem0, LangChain Memory), and demonstrates production readiness through comprehensive testing and real-world integration.

The Challenge

Context Window Mathematics

Context windows constrain LLM working memory. Modern systems provide 32K-200K tokens, but effective memory drops to 10-15K tokens after system instructions (2-3K), conversation history (3-10K), and code context (10-20K)—roughly 10-15 pages of unique information.

Cost compounds the challenge. GPT-4 charges $0.03/1K input tokens; a 32K context costs $1 per request. Repeated loading across sessions creates financial pressure to minimize context.

The Re-initialization Tax

Every session starts with zero context. Developers spend 5-15 minutes reconstructing relevant information: identifying files (2-3 min), explaining tasks (1-2 min), providing architectural context (2-5 min), referencing decisions (0-3 min).

For 40 sessions over a 2-week feature (4/day), that's 200-520 minutes (3.3-8.7 hours) spent on context management. At $100/hour, inefficiency costs $330-$870 per feature.

Multi-Agent Coordination Failures

Without shared memory: Agent A completes work but Agent B can't access results, requiring re-transmission. Race conditions occur when agents duplicate work without knowledge of each other. Deadlocks happen when agents wait on each other circularly. Debugging coordination failures becomes impossible without audit trails.

Architecture

Core Memory System

Hybrid Search: Three complementary techniques provide multi-modal retrieval:

FTS5 Keyword Search (20% weight): SQLite's full-text search with BM25 ranking, <0.5ms typical latency
Graph Expansion (10% weight): Recursive CTEs traverse memory links with strength weighting, ~5ms for 1-hop traversal
Vector Semantics (70% weight, planned v2.2+): Embedding-based similarity using fastembed or Voyage AI

Storage: LibSQL provides ACID guarantees, B-tree indexes on namespace/importance/created_at, FTS5 virtual tables, and ~800KB per 1,000 memories. Performance: 0.5ms get-by-ID, 0.88ms list-recent, 1.61ms hybrid-search.

System Architecture

The following diagram shows the high-level component architecture and data flow through the system:

Hybrid Search Architecture

Mnemosyne uses a three-strategy hybrid search system combining FTS5, graph traversal, and vector similarity:

Data Flow

End-to-end data movement from user input through processing, storage, and retrieval:

Four-Agent Framework

Specialized agents coordinate through Ractor actor supervision:

Orchestrator: Prioritized work queue (0=highest), dependency tracking, 60s deadlock detection with cycle resolution
Optimizer: Context budget allocation (40% critical, 30% skills, 20% project, 10% general), dynamic skill discovery, prefetching
Reviewer: Quality gates (intent satisfied, tests passing, docs complete, no anti-patterns), DSPy-based semantic validation
Executor: Work execution with timeout/retry, sub-agent spawning for parallel work, graceful failure with rollback

Multi-Agent Coordination Flow

The following diagram illustrates how the four agents interact during a typical work session:

Autonomous Evolution

LLM-guided optimization runs during idle periods:

Consolidation: Claude Haiku analyzes memory pairs for merge/supersede/keep-both decisions
Importance Recalibration: Recency decay (e^(-age/30)), access boost (+0.1 per retrieval, max +2.0), graph proximity (+0.05 per neighbor)
Link Decay: -1% strength per day inactive, access reinforcement, prune <0.2 strength
Archival: Soft-delete memories with importance <2 AND age >90 days, or superseded memories after 7 days

Technology Stack

Core: Rust 1.75+ (type safety, zero-cost abstractions), Tokio (async runtime), LibSQL (SQLite-compatible with vector support), PyO3 0.22 (Python bindings, 10-20x faster than subprocess)

LLM: Claude Haiku 4.5 for enrichment, linking, consolidation (<500ms typical, 4-5x cheaper than Sonnet)

Protocols: MCP (JSON-RPC 2.0 over stdio), SSE (real-time events), Ractor message passing

gRPC Remote Access (v2.2.0)

The RPC feature provides production-ready gRPC server access enabling external applications to store, search, and manage memories remotely. Built on Tonic with Protocol Buffers for type-safe, high-performance access.

RPC Architecture

Services:

MemoryService: 13 methods for CRUD operations, hybrid search (Recall, SemanticSearch, GraphTraverse), and streaming (RecallStream, ListMemoriesStream)
HealthService: System monitoring with HealthCheck, GetStats, GetMetrics, GetMemoryUsage, StreamMetrics, and GetVersion

Language Support: Python, Go, Rust, JavaScript, or any gRPC-compatible language via Protocol Buffers

Integrated Context Studio (ICS)

Terminal-based semantic editor with multi-panel UI, CRDT-based collaborative editing, and real-time validation:

Dashboard Monitoring

Real-time web-based monitoring of multi-agent orchestration with 6-panel TUI showing memory metrics, context usage, work progress, and agent coordination:

Comparison with Existing Systems

Feature	Mnemosyne	MemGPT	Mem0	LangChain Memory
Memory Model	Hybrid (FTS5 + Graph, Vector planned)	Virtual context (RAM/disk pages)	Graph nodes	Conversation buffers
Multi-Agent Coordination	4-agent framework	Single-agent focus	Limited (application layer)	None
Evolution System	Autonomous (LLM-guided)	Manual management	Limited automation	Manual cleanup
Integration	MCP + Hooks + CLI + Dashboard	Python library + API	REST API + SDKs	Python library
Implementation	Rust + Python bindings	Python	Python + Go	Python
Production Readiness	702 tests, type safety	Research/experimental	Beta (production-ready)	Production (stable)

Mnemosyne treats memory and agents as unified concerns. Where MemGPT provides sophisticated single-agent memory management and Mem0 offers production-grade graph storage, Mnemosyne integrates persistent memory with multi-agent orchestration and autonomous evolution.

Validation & Evidence

Test Coverage

702 passing tests (100% pass rate) across categories:

~250 unit tests: Type system, storage operations, evolution algorithms, serialization
~150 integration tests: MCP server, orchestration, DSPy bridge, LLM service
~80 E2E tests: Human workflows, agent coordination, recovery scenarios
~50 specialized tests: File descriptor safety, process management, ICS integration

Performance Metrics

Benchmarks from tests/performance/:

Store memory: 2.25ms (includes async LLM enrichment dispatch)
Get by ID: 0.5ms (direct UUID lookup)
List recent: 0.88ms (indexed query)
Hybrid search: 1.61ms (FTS5 + graph on 1K memories)
Graph traversal: ~5ms (1-hop), ~12ms (2-hop)

Production Readiness

Stability established through:

File descriptor leak prevention (commit 87b7a33): Hooks close all FDs, validation in test suite
Terminal corruption prevention (commit eec1a33): Clean process management, proper signal handling
Robust error handling: Result<T,E> throughout, custom error types, graceful degradation

Quality Gates

Multi-layered quality assurance process ensures production reliability. Every change must pass 8 validation gates:

Code Validation

Complete validation matrix available: validation.md

Every technical claim maps to v2.2.0 source code and tests. Sample mappings:

Sub-ms retrieval (0.88ms) → src/storage/libsql.rs:420-450 + tests
4-agent orchestration → src/orchestration/mod.rs:89-150 + tests

Summary

Mnemosyne demonstrates that semantic memory and multi-agent orchestration form a unified system. The architecture delivers persistent context through hybrid search, multi-agent coordination via specialized agents with Ractor supervision, autonomous evolution through LLM-guided consolidation, and production integration via MCP protocol.

The system addresses fundamental challenges: context loss elimination (sessions maintain complete state), coordination infrastructure (shared memory enables debugging), cognitive load reduction (automatic context loading), and long-running workflow support (accumulation over weeks).