There's a dirty secret in the AI agent ecosystem: most memory systems aren't memory systems at all. They're buckets. You put things in, you pull things out, and everything inside is treated with identical indifference — the conversation from this morning weighted the same as the context from six months ago. Recent. Stale. Critical. Noise. The bucket doesn't care.
This is why your AI agent keeps forgetting what matters, keeps surfacing irrelevant context, and keeps bloating your context window with garbage from last quarter. It's not a model problem. It's a memory architecture problem. And it's solvable — but not by adding more storage.
The Flat Memory Problem
Every major memory framework ships the same basic abstraction: a store with timestamps and similarity search. Mem0, Zep, Letta — respected projects, genuinely useful as storage layers. But storage is not memory. Memory is storage plus physics.
In current implementations, here's what happens in practice:
- An agent interaction fires. Some facts get extracted.
- Those facts go into a vector store with a timestamp.
- On the next retrieval, semantic similarity scores determine what surfaces.
- Everything in the store is a candidate. Nothing is ranked by relevance over time.
The result is a retrieval landscape that's flat. Your agent equally “remembers” that a user prefers dark mode UI and that they're running a critical infrastructure migration — because both items scored 0.87 similarity to the current query. There's no gravitational pull toward what actually matters right now. There's no forgetting of what no longer matters at all.
This produces three compounding failure modes:
- Context bloat. Retrieval systems that don't decay surface everything. Everything gets shoved into the context window. Token costs explode, latency grows, and the model's attention dilutes across decades of accumulated cruft.
- Irrelevant recall. Without reinforcement mechanics, a casual mention from month one has the same retrieval weight as a repeated, high-stakes topic the user returns to constantly. The system treats frequency and recency as identical — which they are not.
- No prioritization. Nothing tells the memory system what matters. Timestamps are not importance signals. Cosine similarity is not urgency. The retrieval system is blind to the difference between noise and signal.
The fundamental issue: these systems treat memory as a filing cabinet. Retrieve by keyword. Done. But the human brain — the only working reference implementation we have for long-term associative memory — operates nothing like a filing cabinet. It operates like thermodynamics.
Memory Has Temperature
Here's the insight that changes the architecture entirely: memories aren't equal — they have temperature.
Hot memories are recent, repeatedly accessed, or actively reinforced. They surface instantly. A user's ongoing project architecture decision is hot. The edge case they mentioned once in passing is not.
Cold memories are stale, unreinforced, low-access. They fade naturally. Not deleted — cooled. Still retrievable under the right conditions, but no longer competing with hot memories for surface space.
This isn't a metaphor for decoration. It's a design specification. In thermodynamic memory, every stored fact carries a thermal state — a numerical representation of its current retrieval priority — governed by a decay function:
T(t) = T₀ × e^(-λt) + Σ(reinforcement_events) where: λ = decay constant (configurable per memory type) T₀ = initial temperature reinforcement = additive heating on re-access or relevance confirmation
This single change transforms retrieval from keyword lookup into prioritized surfacing. The retrieval engine doesn't just ask “is this similar?” It asks “is this similar and is it still hot?”
How SULCUS Works
SULCUS is built around a thermodynamic decay engine as a first-class primitive. It's not a wrapper around an existing vector store with decay bolted on. The decay function is core infrastructure.
Thermodynamic Decay Engine
Every memory object carries a thermal metadata block: initial temperature, decay constant, last reinforcement timestamp, and an accumulated heat value. Configurable half-lives per memory type are the critical design decision. Not all facts decay at the same rate. A user's name: very slow decay, measured in months. A specific bug they mentioned in passing: fast decay, measured in days. An ongoing task they return to weekly: medium decay with reinforcement spikes on each return.
Spaced Repetition Reinforcement
Borrowing from cognitive science — specifically the Ebbinghaus forgetting curve — SULCUS reinforces memories each time they're accessed or re-confirmed. Direct access triggers a full reinforcement spike. Indirect relevance triggers a partial spike. Contradiction from new context triggers negative reinforcement (cooling). The system learns importance through use rather than requiring explicit tagging.
CRDT Sync for Cross-Agent Memory
Multi-agent architectures introduce a hard problem: memory synchronization. If Agent A and Agent B share a user context, who owns the memory? SULCUS uses CRDTs (Conflict-free Replicated Data Types) for memory state — distributed memory across agent instances merges deterministically without coordination overhead. No locks. No primary node. Every agent in the mesh converges to the same thermal state.
MCP Native
SULCUS implements the Model Context Protocol (MCP) natively. Drop it into Claude Desktop, wire it into your MCP-compatible agent framework, and thermodynamic memory is live with zero custom integration code.
Self-Hosted First
SULCUS runs on an embedded PostgreSQL instance by default. Zero external cloud dependency. No SaaS account required. No data leaves your infrastructure. For teams that want managed hosting, a cloud tier is available — but self-hosted is the default path.
MemBench: Show Your Work
Claims about memory quality are cheap. Benchmarks are not.
MemBench is an open benchmark for evaluating AI memory systems across four dimensions: recall precision, temporal relevance, noise rejection, and cross-agent coherence.
Run it against Mem0. Run it against Zep. Run it against SULCUS. Compare the numbers. The benchmark is the argument. We're publishing it because the current landscape of memory evaluation is nearly all marketing. “We have memory” is not a specification. MemBench forces specificity.
SULCUS vs. The Field
| Mem0 | Zep | Letta | SULCUS | |
|---|---|---|---|---|
| Memory model | Flat + similarity | Flat + recency | Stateful agent | Thermodynamic decay |
| Decay function | ❌ | Partial | ❌ | Configurable half-lives |
| Reinforcement | ❌ | ❌ | ❌ | Spaced repetition |
| CRDT sync | ❌ | ❌ | ❌ | Native |
| MCP native | ❌ | ❌ | ❌ | Yes |
| Self-hosted | Partial | Yes | Yes | Yes (embedded PG) |
| Open benchmark | ❌ | ❌ | ❌ | MemBench |
The Architecture Decision You'll Regret Not Making Early
Memory architecture is one of those decisions that seems low-stakes until it isn't. You start with a simple vector store. Context stays small. Everything works. Then your agent accumulates six months of user history, retrieval starts surfacing noise, context windows balloon, and you realize you've built a flat memory system you now need to migrate off.
The thermodynamic approach requires more design upfront — configuring decay profiles, thinking about reinforcement signals, understanding your hot-window requirements. But it pays compound returns. The longer your agent runs, the more intelligent its recall becomes. Memory that's been used stays hot. Memory that doesn't matter cools away. The system improves through use rather than degrading through accumulation.
That's what memory is supposed to do.
Try SULCUS
SULCUS is available now at sulcus.ca. MCP server works with Claude Desktop out of the box. Self-hosted with embedded PG — no cloud dependency required.
The bucket era of AI memory is over. Your agents deserve physics.