Mnemosyne Whitepaper

Semantic Memory and Multi-Agent Orchestration for LLM Systems

v2.2.0 November 2025 github.com/rand/mnemosyne

Abstract

Large language models face fundamental limitations: context windows bound working memory, coordination between agents lacks persistence, and knowledge evaporates between sessions. Mnemosyne addresses these challenges through a production-ready semantic memory system with multi-agent orchestration.

Built in Rust with LibSQL storage, it provides sub-millisecond retrieval (0.88ms list operations, 1.61ms search), LLM-guided memory evolution, and a four-agent coordination framework composed of Orchestrator, Optimizer, Reviewer, and Executor agents. The system integrates with Claude Code via Model Context Protocol, automatic hooks, and real-time monitoring.

Hybrid search combines keyword matching (FTS5), graph traversal, and vector similarity with weighted scoring. Privacy-preserving evaluation, comprehensive testing, and production deployment enable persistent context across sessions, autonomous agent coordination, and continuous memory optimization.

This paper presents the architecture, validates claims against tagged source code (v2.2.0), compares with existing solutions (MemGPT, Mem0, LangChain Memory), and demonstrates production readiness through comprehensive testing and real-world integration.

Executive Summary
Introduction
The Challenge: Context Loss in LLM Systems
Mnemosyne Architecture
Workflows & Integration
Qualitative Comparison
Validation & Evidence
Conclusion
References

1. Executive Summary

1.1 The Problem

Context window limitations constrain LLM working memory, forcing developers to repeatedly reconstruct context for each session. Multi-agent systems lack persistent coordination state, leading to race conditions, deadlocks, and lost decision rationale. Knowledge evaporates between sessions, requiring manual re-initialization that can consume 5-15 minutes per session. Existing solutions address single dimensions—memory persistence OR agent coordination—but not both simultaneously. The cumulative cost of context loss across development lifecycles represents significant wasted human and computational resources.

1.2 The Solution

Mnemosyne provides an integrated semantic memory system with multi-agent orchestration, enabling persistent context and autonomous coordination for LLM-based systems. The architecture combines four key innovations:

Hybrid Search System: FTS5 keyword search (20% weight) and graph traversal via recursive CTE (10% weight) provide multi-modal retrieval with sub-millisecond latency. Vector semantics (70% weight) planned for v2.2+.

Four-Agent Framework: Ractor-based actor supervision with specialized agents:

Orchestrator: Work queue management, deadlock detection, phase transitions
Optimizer: Context budget allocation, dynamic skill discovery
Reviewer: Quality gate validation, semantic verification via DSPy integration
Executor: Work execution with sub-agent spawning capability

LLM-Guided Evolution: Claude Haiku 4.5 provides automatic memory consolidation (merge/supersede decisions), importance recalibration based on recency and access patterns, link decay with activity-based boosting, and archival with audit trail preservation.

Production Integration: Model Context Protocol (MCP) over JSON-RPC 2.0, automatic hooks for Claude Code (session-start, post-tool-use, pre-destructive), real-time Server-Sent Events (SSE) monitoring, and PyO3 Python bindings offering 10-20x speedup over subprocess approaches.

1.3 Key Capabilities

Mnemosyne delivers production-grade performance and reliability:

Sub-millisecond Retrieval: 0.88ms for list operations, 1.61ms for hybrid search queries, validated across test suite.

Namespace Isolation: Three-tier hierarchy (Global → Project → Session) provides automatic context boundaries with priority-based search boosting.

Seamless Integration: Automatic Claude Code hooks inject memories at session start (+50-100ms latency), capture architectural commits post-tool-use, and enforce memory hygiene pre-destructive operations.

Real-Time Observability: HTTP API server (port 3000 with auto-increment) broadcasts events via SSE to dashboard clients, supporting owner/client mode for multi-instance coordination.

Comprehensive Testing: Full coverage across unit (type system, storage operations), integration (MCP server, orchestration), E2E (human workflows, agent coordination), and specialized (file descriptor safety, process management) categories.

1.4 Target Use Cases

Mnemosyne addresses critical needs in LLM agent deployments:

Persistent Context: Claude Code sessions maintain architectural decisions, debugging insights, and project-specific knowledge across days and weeks, eliminating manual context reconstruction.

Multi-Agent Coordination: Shared memory provides audit trails for agent decisions, dependency tracking prevents deadlocks, and event persistence enables debugging of coordination failures.

Autonomous Systems: Long-running agents accumulate domain knowledge, consolidate duplicate learnings automatically, and decay obsolete information without human intervention.

Development Workflows: Capture architectural rationale during implementation, preserve bug fix insights for similar issues, and maintain project constitution across contributor changes.

2. Introduction

2.1 The Context Window Challenge

Large language models operate within context windows—bounded memory spaces that constrain how much information the model can process simultaneously. Modern systems provide 32,000 to 200,000 tokens, translating to roughly 40-250 pages of text. However, effective working memory remains far smaller once we account for system prompts, conversation history, and code repetition.

Consider a typical development session in Claude Code: system instructions consume 2,000-3,000 tokens, conversation history accumulates at 500-1,000 tokens per exchange, and code context (files, documentation, previous implementations) can easily reach 10,000-20,000 tokens. This leaves 10,000-15,000 tokens for actual problem-solving—approximately 10-15 pages of unique information.

Cost compounds the challenge. GPT-4 charges $0.03 per 1,000 input tokens; filling a 32K context window costs nearly $1 per request. Repeated context loading across sessions creates financial pressure to minimize context, further constraining working memory.

2.2 Current Landscape

Several systems address aspects of LLM memory persistence:

MemGPT introduces virtual context management inspired by operating system memory hierarchies. It treats LLM context as RAM and external storage as disk, implementing page swapping to exceed context window limits. However, MemGPT focuses on single-agent scenarios and requires manual memory management decisions.

Mem0 provides graph-based memory with production deployment focus. It represents memories as nodes in a knowledge graph, enabling relationship traversal and context assembly. However, it provides limited support for multi-agent coordination and lacks automatic memory evolution capabilities.

LangChain Memory offers conversation buffers, summaries, and entity extraction as modular components within the LangChain ecosystem. However, LangChain Memory focuses on conversation context rather than agent coordination, and memory management remains largely manual.

2.3 Mnemosyne's Position

Mnemosyne occupies a distinct position by integrating memory persistence with multi-agent orchestration in a production-ready system. Where existing solutions treat memory OR agents as primary concerns, Mnemosyne views them as inseparable: persistent memory enables agent coordination, and agent activity generates memories worth preserving.

3. The Challenge: Context Loss in LLM Systems

3.1 Context Window Mathematics

Context window constraints create a fundamental tension between scope and depth. Consider a 32,768-token context window—roughly 40 pages of text at 800 tokens per page. This appears sufficient until we account for overhead:

System Instructions: Claude Code injects 2,000-3,000 tokens of instructions defining agent behavior, constraints, and protocols.

Conversation History: Each user request and assistant response consumes 300-1,000 tokens. A typical session with 10 exchanges uses 3,000-10,000 tokens.

Code Context: Opening a single TypeScript React component (200 lines) consumes 400-600 tokens with syntax. Five related files exceed 2,000-3,000 tokens.

After accounting for these overheads, the effective working memory drops to 10,000-15,000 tokens—approximately 12-18 pages.

3.2 The Re-initialization Tax

Every new session starts with zero context. Developers must reconstruct relevant information through a manual process:

Identify relevant files (2-3 minutes): "What did I work on yesterday? Which files matter?"
Explain the task (1-2 minutes): "I'm implementing feature X with constraints Y and Z."
Provide architectural context (2-5 minutes): "This project uses pattern A, avoids anti-pattern B."
Reference previous decisions (0-3 minutes): "We decided to use library D for reason E."

Total time: 5-13 minutes per session. For a developer with 4 sessions per day over a 2-week feature implementation (40 sessions), that's 200-520 minutes (3.3-8.7 hours) spent on context reconstruction.

4. Mnemosyne Architecture

4.1 Core Memory System

The memory system provides persistent storage, hybrid search, and graph-based relationships through LibSQL (SQLite-compatible) with native vector search capabilities.

4.1.1 Memory Model

MemoryNote serves as the fundamental data structure, containing 20+ fields organized in logical groups:

Identity: UUID-based memory_id, hierarchical namespace (Global/Project/Session)
Content: content (full text), summary (LLM-generated), keywords, tags
Classification: memory_type (9 categories), importance (1-10 scale), confidence (0.0-1.0)
Relationships: related_files, related_entities, graph links to other memories
Metadata: access_count, last_accessed_at, expires_at, superseded_by

4.1.2 Hybrid Search

Three complementary techniques combine for multi-modal retrieval:

FTS5 Keyword Search (20% weight): SQLite's FTS5 virtual table provides BM25-ranked full-text search across content, summary, keywords, and tags. Typical latency: <0.5ms for keyword matching.

Graph Expansion (10% weight): Recursive common table expressions (CTEs) traverse memory links starting from FTS5 results. Configurable depth (default: 2 hops) balances recall and performance.

Vector Semantics (70% weight, planned): Embedding-based similarity planned for v2.2+ using fastembed (local, 768-dimensional) or Voyage AI (remote, 1536-dimensional).

4.2 Multi-Agent Orchestration

Four specialized agents—Orchestrator, Optimizer, Reviewer, Executor—coordinate through Ractor actor supervision, providing work queue management, context optimization, quality validation, and parallel execution.

4.2.1 Four-Agent Framework

Orchestrator manages global state and coordination:

Work Queue: Prioritized queue (0=highest priority) with dependency tracking
Deadlock Detection: 60-second timeout triggers cycle detection in dependency graph
Phase Transitions: State machine for Work Plan Protocol (Prompt→Spec→Plan→Artifacts)

Optimizer manages context allocation and skill discovery:

Context Budget: 40% critical, 30% skills, 20% project, 10% general
Skill Discovery: Scans local and global directories, scores relevance, loads top 7 most relevant

Reviewer validates quality and correctness:

Quality Gates: Intent satisfied, tests passing, documentation complete, no anti-patterns
Semantic Validation: DSPy modules extract requirements and validate implementations

Executor performs actual work:

Work Execution: Retrieves tasks from Orchestrator queue, executes with timeout and retry
Sub-Agent Spawning: Creates child Executor instances for parallel work when dependencies allow

4.3 Evolution System

Four background jobs optimize the memory store autonomously: consolidation merges duplicates, importance recalibration adjusts relevance, link decay prunes weak connections, and archival removes low-value memories.

4.3.1 Consolidation

Claude Haiku 4.5 analyzes memory pairs for similarity:

Three Outcomes:

Merge: Combine content from both, preserve all links
Supersede: Keep higher-importance memory, mark lower as superseded
KeepBoth: Too different to merge, create References link

4.3.2 Importance Recalibration

Weekly batch job adjusts importance scores:

Recency Decay: adjusted = base_importance × e^(-age_days/30)
Access Boost: boost = min(access_count × 0.1, 2.0)
Graph Proximity: graph_boost = min(neighbor_count × 0.05, 1.0)

5. Workflows & Integration

5.1 Developer Workflows

5.1.1 Memory Capture

Manual Capture: CLI provides explicit control for important insights:

mnemosyne remember "Decided to use event sourcing for audit trail" \
  -i 9 \
  -t "architecture,patterns,audit" \
  -n "project:myapp"

5.1.2 Memory Recall

Search: Hybrid search across keyword and graph space:

mnemosyne recall -q "authentication flow" -l 10 --min-importance 7

5.2 Claude Code Integration

5.2.1 MCP Protocol Tools

Eight OODA-aligned tools provide Claude Code access:

Observe Phase: mnemosyne.recall, mnemosyne.list
Orient Phase: mnemosyne.graph, mnemosyne.context
Decide Phase: mnemosyne.remember, mnemosyne.consolidate
Act Phase: mnemosyne.update, mnemosyne.delete

5.2.2 Automatic Hooks

Three hooks provide zero-configuration context management:

session-start.sh: Loads memories at Claude Code initialization
post-tool-use.sh: Captures architectural commits automatically
pre-destructive.sh: Enforces memory hygiene before pushing

6. Qualitative Comparison

Feature	Mnemosyne	MemGPT	Mem0	LangChain Memory
Memory Model	Hybrid (FTS5 + Graph, Vector planned)	Virtual context (RAM/disk pages)	Graph nodes with relationships	Conversation buffers + summaries
Multi-Agent Coordination	4-agent framework (Ractor supervision)	Single-agent focus	Limited (application layer)	None (chains coordinate)
Evolution System	Autonomous (consolidation, importance, decay, archival)	Manual management	Limited automation	None (manual cleanup)
Production Readiness	715 tests, Rust safety, v2.1.2 stable	Research/experimental (Python)	Beta (production-ready)	Production (LangChain stable)

7. Validation & Evidence

7.1 Test Coverage

715 passing tests achieve 100% pass rate across multiple categories:

Unit Tests (~250 tests): Type system validation, storage operations, search algorithms
Integration Tests (~150 tests): MCP server, orchestration system, DSPy bridge
E2E Tests (~80 tests): Human workflows, agent workflows, recovery scenarios
Specialized Tests (~50 tests): File descriptor safety, process management, ICS integration

7.2 Performance Metrics

Storage Operations:

Store memory: 2.25ms average (includes LLM enrichment dispatched to background)
Get by ID: 0.5ms (direct UUID lookup via index)
List recent: 0.88ms (indexed query on created_at with limit)
Update memory: 1.2ms (UPDATE with transaction)

Search Operations:

FTS5 keyword search: 1.1ms (on 10,000 memories)
Graph traversal (1 hop): ~5ms (recursive CTE with joins)
Hybrid search (FTS5 + graph): 1.61ms average (1,000 memories)

8. Conclusion

8.1 Summary of Contributions

Mnemosyne demonstrates that semantic memory and multi-agent orchestration form a unified system rather than separate concerns. The architecture delivers:

Persistent Context through hybrid search, sub-millisecond retrieval, namespace isolation, and 715 tests validating correctness
Multi-Agent Coordination via four specialized agents with Ractor supervision and event persistence
Autonomous Evolution through LLM-guided consolidation, importance recalibration, and link decay
Production Integration via MCP protocol, automatic hooks, real-time SSE monitoring, and PyO3 bindings

8.2 Impact on LLM Agent Systems

Mnemosyne addresses fundamental challenges in LLM deployments:

Context Loss Elimination: Sessions spanning days or weeks maintain complete context
Coordination Infrastructure: Shared memory provides state synchronization without tight coupling
Cognitive Load Reduction: Automatic context loading eliminates manual reconstruction
Long-Running Workflow Support: Context accumulates over weeks of development

9. References

[1] Packer, C., et al. (2023). "MemGPT: Towards LLMs as Operating Systems." arXiv preprint arXiv:2310.08560.

[2] Mem0 Documentation. "Graph-based Memory for AI Applications." https://docs.mem0.ai/

[3] LangChain Memory Documentation. https://python.langchain.com/docs/modules/memory

[4] Model Context Protocol Specification. https://modelcontextprotocol.io/

[5] Claude Code Documentation. https://claude.ai/claude-code

Mnemosyne v2.2.0 (November 2025)
Repository: github.com/rand/mnemosyne
License: MIT