Research

Benchmarks & research foundations

MetaMemory is grounded in cognitive science and validated on production benchmarks. Here are the numbers, the methodology, and the research that informs our architecture.

Benchmark results

67.95%

LoCoMo F1

F1 score on the LoCoMo long-conversation benchmark, vs. 43.24% best published baseline.

92%

HotpotQA F1

F1 score on the HotpotQA multi-hop question answering benchmark.

67.40%

LongMemEval

Overall accuracy on LongMemEval benchmark, vs. GPT-4o baseline of 60.6%.

77%

LongMemEval-S

Overall LongMemEval-S score: 100% single-session accuracy, 72% multi-session accuracy.

70%

Memory Compression

LLM-powered consolidation reduces storage while preserving recall quality.

<100ms

Retrieval Latency

P95 latency for multi-channel retrieval with RRF fusion at production scale.

Evaluation methodology

LoCoMo (Long-Context Conversation Memory) evaluates memory systems on their ability to accurately recall information from extended multi-session dialogues. MetaMemory achieves 67.95% F1, compared to the best published baseline of 43.24%.

HotpotQA is a multi-hop question answering benchmark requiring reasoning across multiple documents. MetaMemory achieves 92% F1 on this benchmark.

LongMemEval evaluates long-term memory capabilities across single and multi-session conversations. MetaMemory scores 67.40% overall (vs. GPT-4o at 60.6%). On the LongMemEval-S variant, MetaMemory achieves 77% overall with 100% single-session and 72% multi-session accuracy.

Compression is measured as the ratio of consolidated memory store size to raw memory store size. The 70% compression figure represents the average across diverse conversation types.

Retrieval latency is measured at P95 under production load (1,000+ concurrent sessions) with multi-channel retrieval and RRF fusion enabled. Hardware: standard cloud instances (4 vCPU, 16 GB RAM) with PostgreSQL + pgvector.

Cognitive science foundations

Tulving's Memory Taxonomy

MetaMemory's four-vector architecture draws inspiration from Endel Tulving's taxonomy of long-term memory — semantic (facts), episodic (experiences), and procedural (skills). Our implementation maps these concepts to four engineered embedding types — semantic, emotional, process, and context — optimized for AI agent retrieval rather than being a 1:1 replica of the cognitive model.

Multi-Armed Bandits for Retrieval

Adaptive strategy selection uses Thompson Sampling and Upper Confidence Bound (UCB) algorithms from the multi-armed bandit literature. These are well-studied Bayesian methods for balancing exploration and exploitation — applied here to learn which retrieval channel works best for each query type.

Memory Consolidation Theory

Inspired by the complementary learning systems (CLS) theory of hippocampal-neocortical memory transfer, MetaMemory's consolidation process mirrors what happens during sleep: related memories are merged, redundant information is compressed, and important connections are strengthened.

Reciprocal Rank Fusion

RRF is a well-established information retrieval technique for combining ranked results from heterogeneous sources. MetaMemory uses RRF to fuse results from its 5 specialized retrieval channels into a single ranking, avoiding the score calibration problems of raw score merging.

References

[1]Tulving, E. (1972). Episodic and semantic memory. In Organization of Memory.
[2]Tulving, E. (1985). Memory and consciousness. Canadian Psychology, 26(1).
[3]McClelland, J.L., McNaughton, B.L., & O'Reilly, R.C. (1995). Why there are complementary learning systems in the hippocampus and neocortex.
[4]Thompson, W.R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.
[5]Cormack, G.V., Clarke, C.L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods.
[6]Ramaswamy, S. et al. (2024). LoCoMo: Long-Context Conversation Memory Benchmark.

Want to run your own benchmarks?

Start with the free tier and test MetaMemory against your own evaluation suite. No credit card required.

Get Started Documentation