Mnemosyne: Semantic Memory and Multi-Agent Orchestration for LLM Systems

Version 2.2.0 · November 8, 2025 · github.com/rand/mnemosyne

Abstract

Large language models face fundamental limitations: context windows bound working memory, coordination between agents lacks persistence, and knowledge evaporates between sessions. Mnemosyne addresses these challenges through a production-ready semantic memory system with multi-agent orchestration.

Built in Rust with LibSQL storage, it provides sub-millisecond retrieval (0.88ms list operations, 1.61ms search), LLM-guided memory evolution, and a four-agent coordination framework composed of Orchestrator, Optimizer, Reviewer, and Executor agents. The system integrates with Claude Code via Model Context Protocol, automatic hooks, and real-time monitoring.

Hybrid search combines keyword matching (FTS5), graph traversal, and vector similarity with weighted scoring. Privacy-preserving evaluation, comprehensive testing, and production deployment enable persistent context across sessions, autonomous agent coordination, and continuous memory optimization.

This paper presents the architecture, validates claims against tagged source code (v2.2.0), compares with existing solutions (MemGPT, Mem0, LangChain Memory), and demonstrates production readiness through comprehensive testing and real-world integration.

The Challenge

Context Window Mathematics

Context windows constrain LLM working memory. Modern systems provide 32K-200K tokens, but effective memory drops to 10-15K tokens after system instructions (2-3K), conversation history (3-10K), and code context (10-20K)—roughly 10-15 pages of unique information.

Cost compounds the challenge. GPT-4 charges $0.03/1K input tokens; a 32K context costs $1 per request. Repeated loading across sessions creates financial pressure to minimize context.

The Re-initialization Tax

Every session starts with zero context. Developers spend 5-15 minutes reconstructing relevant information: identifying files (2-3 min), explaining tasks (1-2 min), providing architectural context (2-5 min), referencing decisions (0-3 min).

For 40 sessions over a 2-week feature (4/day), that's 200-520 minutes (3.3-8.7 hours) spent on context management. At $100/hour, inefficiency costs $330-$870 per feature.

Multi-Agent Coordination Failures

Without shared memory: Agent A completes work but Agent B can't access results, requiring re-transmission. Race conditions occur when agents duplicate work without knowledge of each other. Deadlocks happen when agents wait on each other circularly. Debugging coordination failures becomes impossible without audit trails.

Architecture

Core Memory System

Hybrid Search: Three complementary techniques provide multi-modal retrieval:

Storage: LibSQL provides ACID guarantees, B-tree indexes on namespace/importance/created_at, FTS5 virtual tables, and ~800KB per 1,000 memories. Performance: 0.5ms get-by-ID, 0.88ms list-recent, 1.61ms hybrid-search.

System Architecture

The following diagram shows the high-level component architecture and data flow through the system:

System Architecture

Hybrid Search Architecture

Mnemosyne uses a three-strategy hybrid search system combining FTS5, graph traversal, and vector similarity:

Hybrid Search Architecture

Data Flow

End-to-end data movement from user input through processing, storage, and retrieval:

Data Flow

Four-Agent Framework

Specialized agents coordinate through Ractor actor supervision:

Multi-Agent Coordination Flow

The following diagram illustrates how the four agents interact during a typical work session:

Multi-Agent Coordination

Autonomous Evolution

LLM-guided optimization runs during idle periods:

Technology Stack

Core: Rust 1.75+ (type safety, zero-cost abstractions), Tokio (async runtime), LibSQL (SQLite-compatible with vector support), PyO3 0.22 (Python bindings, 10-20x faster than subprocess)

LLM: Claude Haiku 4.5 for enrichment, linking, consolidation (<500ms typical, 4-5x cheaper than Sonnet)

Protocols: MCP (JSON-RPC 2.0 over stdio), SSE (real-time events), Ractor message passing

Technology Stack

gRPC Remote Access (v2.2.0)

The RPC feature provides production-ready gRPC server access enabling external applications to store, search, and manage memories remotely. Built on Tonic with Protocol Buffers for type-safe, high-performance access.

RPC Architecture

gRPC Architecture

Services:

Language Support: Python, Go, Rust, JavaScript, or any gRPC-compatible language via Protocol Buffers

Integrated Context Studio (ICS)

Terminal-based semantic editor with multi-panel UI, CRDT-based collaborative editing, and real-time validation:

ICS Architecture

Dashboard Monitoring

Real-time web-based monitoring of multi-agent orchestration with 6-panel TUI showing memory metrics, context usage, work progress, and agent coordination:

Dashboard Architecture

Comparison with Existing Systems

Feature Mnemosyne MemGPT Mem0 LangChain Memory
Memory Model Hybrid (FTS5 + Graph, Vector planned) Virtual context (RAM/disk pages) Graph nodes Conversation buffers
Multi-Agent Coordination 4-agent framework Single-agent focus Limited (application layer) None
Evolution System Autonomous (LLM-guided) Manual management Limited automation Manual cleanup
Integration MCP + Hooks + CLI + Dashboard Python library + API REST API + SDKs Python library
Implementation Rust + Python bindings Python Python + Go Python
Production Readiness 702 tests, type safety Research/experimental Beta (production-ready) Production (stable)

Mnemosyne treats memory and agents as unified concerns. Where MemGPT provides sophisticated single-agent memory management and Mem0 offers production-grade graph storage, Mnemosyne integrates persistent memory with multi-agent orchestration and autonomous evolution.

Validation & Evidence

Test Coverage

702 passing tests (100% pass rate) across categories:

Performance Metrics

Benchmarks from tests/performance/:

Production Readiness

Stability established through:

Quality Gates

Multi-layered quality assurance process ensures production reliability. Every change must pass 8 validation gates:

Quality Gates

Code Validation

Complete validation matrix available: validation.md

Every technical claim maps to v2.2.0 source code and tests. Sample mappings:

Summary

Mnemosyne demonstrates that semantic memory and multi-agent orchestration form a unified system. The architecture delivers persistent context through hybrid search, multi-agent coordination via specialized agents with Ractor supervision, autonomous evolution through LLM-guided consolidation, and production integration via MCP protocol.

The system addresses fundamental challenges: context loss elimination (sessions maintain complete state), coordination infrastructure (shared memory enables debugging), cognitive load reduction (automatic context loading), and long-running workflow support (accumulation over weeks).

Resources