Escaping the AI amnesia - Anthony Skenter

I work across several machines, a laptop, a desktop, and I use Claude Code, Codex on all of them. Each session is sharp and useful, and then it ends and is completely gone. The next one starts from nothing, and if I switch machines I've lost not just the conversation but the machine that held the breadcrumbs. For a quick task that's fine. For a project that runs for months it becomes a tax I pay every day, explaining the same constraints again, deriving the same decisions again, teaching the same preferences again. Claude Code softens the blow with instruction files for each project and a local automatic memory feature, but those are tied to one machine and one moment. They don't cross to another device, and they don't learn. They hold only what you explicitly write down.

Claude Journal is the system I built to stop paying that tax. It runs quietly alongside Claude Code. Every session, on every machine, it records a small trace of what happened. Once a day, and sometimes more often, a single step in the cloud reads those traces and distills them into the two things worth keeping. The first is memories, the durable facts and preferences about me and my projects. The second is skills, the techniques it has watched me use enough times that they are worth reusing. Those results flow back to every device, so the assistant starts each session already knowing what it worked out in the last one, even if the last one was on a different machine a week ago. The purpose is narrow and practical. I want an AI assistant that accumulates understanding of my projects and the way I work, instead of resetting to zero every time I open it.

The design isn't arbitrary. It's borrowed, fairly directly, from the way human memory works. You capture the day cheaply, consolidate the gist while you sleep, and let almost everything else fade. That analogy runs through the whole thing, so it's where this piece begins. The actual plumbing lives in a reference section at the end, because the why here is more interesting than the wiring.

A useful kind of amnesia

A language model has no memory between calls. Everything it "knows" about your session lives in the context window, and when the session closes the context is discarded. That window is, functionally, working memory, fast, capacious enough for the task in front of it, and entirely volatile.

What survives a session is a separate, thinner layer. Claude Code keeps an automatic memory for each project, a small folder where it saves a durable fact as its own little file, and at the start of every session it reloads an index of those notes so the assistant opens already knowing what it recorded before. The instinct is right, a long term store sitting beside the volatile window. What it lacks is the part that makes long term memory useful. It never merges related notes or generalises across them, and the folder stays on the one machine that wrote it. It accumulates facts, it does not consolidate them, and that gap is the whole reason the rest of this exists.

The obvious fix is to keep everything. You dump every transcript into a database, embed it, and retrieve the relevant chunks on demand. That's the path most of these projects take, and I deliberately didn't take it. To see why, it helps to look at how the one system we know that does this really well, a brain, pulls it off.

How memory actually works

The rough picture neuroscience gives us is that memory isn't one thing, and it isn't a recording. It's a pipeline.

During the day, experience lands in a fast, temporary store centered on the hippocampus, rich and detailed episodic traces of what just happened. This is cheap to write and not built to last. The durable version lives elsewhere, in the neocortex, and getting it there is a separate, slower process called consolidation. The striking part is when it happens, which is largely during sleep. In slow wave sleep the brain replays the day's hippocampal traces and selectively moves the salient ones into long term cortical storage. The rest decays.

Two things about that are worth dwelling on, because the whole design rests on them.

The first is that forgetting is a feature, not a bug. A brain that kept everything in full fidelity would be useless, drowning in detail and unable to generalize. We keep the gist and shed the verbatim, and that compression is exactly what lets us form concepts instead of just accumulating transcripts. Consolidation is as much about deciding what to throw away as what to keep.

The second is that there are two kinds of long term memory, and they run on different machinery. Declarative memory holds facts and events, that Paris is the capital of France, that you had eggs this morning. Procedural memory holds skills, riding a bike, touch typing, the muscle sequence of a backhand. You can state a fact the instant you learn it. A skill you can only acquire by doing it, over and over, until it compiles down into something automatic you no longer think about. And procedural memory forms through repetition spaced over time, which is why cramming fails and distributed practice sticks. One good attempt doesn't make a skill. A move you reach for again and again, across days, does.

Hold onto those two ideas, consolidate the gist and discard the noise, and facts and skills are different things, because the system is essentially those two ideas turned into software.

Distillation, not retrieval

This is why I didn't reach for a vector database. Retrieval keeps everything and hopes the right fragment is nearby in embedding space when you query. It's the opposite of how memory works. There is no consolidation, no forgetting, no gist, just the full raw episodic record, fished back out by similarity. I think it's the wrong model, for three reasons.

Signal. A distilled memory is something a process decided was worth keeping, a curated fact or rule. A retrieved chunk is whatever happened to sit near a query vector. The first is knowledge. The second is a lucky guess about relevance. Flooding the context with retrieved transcript fragments is the digital equivalent of being unable to recall a lesson without reliving the entire afternoon that taught it.

Cost and trust. Distillation is expensive, because it takes a model to do the summarizing, so it should happen once, on a schedule, not on every device and not on every keystroke. The output is small, plain text, and reviewable. I can open a memory file and read, in plain English, exactly what my assistant believes about a project, and delete it if it's wrong. You cannot do that with a few thousand numeric vectors.

Control over behavior. The riskiest thing an assistant can carry between sessions isn't a fact. It's a rule, an instruction that silently changes how it behaves everywhere. A retrieval system can't tell the difference between "the API base URL is X" and "always refactor without asking." Distillation can, and treating those two with different levels of caution turns out to be the most important design decision in the whole thing.

The brain doesn't store your day as a searchable log. It keeps the lesson and forgets the keystrokes. So should this.

Facts and skills

If keeping the gist is the first idea borrowed from biology, the split between declarative and procedural memory is the second, and it's the one that gives the system its most interesting behavior.

Declarative memories are the facts and rules. The consolidator reads the day's sessions and asks, of each one, would this still matter next month? A fact about me, the project, or an external system, something like "this repo freezes merges on Thursdays" or "auth is split across two services for a compliance reason," becomes a small memory and is applied automatically. These are cheap to get wrong, because a bad fact wastes one lookup and no more.

Skills are procedural memory, and they're treated completely differently, because, like a habit, a skill encodes behavior, and behavior has reach. In Claude Code a skill is a reusable, self contained technique, a bundle of "here's when to use this, and here's how." Once it exists, the assistant can invoke it on its own initiative. That's powerful, and it's exactly why it can't be learned carelessly. A wrong fact is a wrong answer. A wrong skill is a wrong reflex, fired automatically, on every task that looks vaguely relevant.

So the system borrows biology's safeguard against bad habits. It won't compile a skill from a single experience. A technique is only proposed as a skill when the same move shows up across at least two sessions on two different days. That threshold is the spacing effect, turned into a rule. A clever solution used once is a memory at most. A method I reach for repeatedly, on separate days, is something my hands have learned, and only that earns promotion to a procedure. That two day gate is the single most important line of defence against canonizing a lucky hack as a permanent capability.

There's a second safeguard, also borrowed from how habits form. A skill is never installed silently. The consolidator can only propose one. I review it and choose to accept, skip, or edit, and an accepted skill is then installed and synced to every device at once, the way a motor skill, once learned, is available to you no matter which hand or which court you're on. Each proposal carries its provenance too, the sessions and the days that distilled it, so I can always see why the system thinks this is a skill before I let it become one. Forming a habit deliberately, with a moment of review before it sets, is precisely how you avoid waking up with one you didn't choose.

The same logic governs the smaller stuff. A behavioral rule the system infers from how I work, like "stop summarizing at the end" or "don't mock the database in tests," changes behavior everywhere, so it too goes through the review queue rather than applying on its own. It's a habit you get to veto before it forms. Plain facts don't need that. They just get remembered. The dividing line throughout is blast radius. Automate what's cheap to get wrong, and put a human in the loop for anything that quietly rewires how the assistant acts.

The shape of the system

Turned into machinery, the metaphor has three moves and one shared spine.

The claude-journal pipeline showing devices capturing breadcrumbs into an encrypted repo, a cloud routine consolidating them into digests, memories, skills, and proposals, and every device pulling the distilled output back The loop. Each device writes a cheap breadcrumb when a session closes, the episodic trace. A single cloud routine plays the role of sleep. Once a day, or more, it reads the raw traces and consolidates them into durable memories, skills, and proposals. Those propagate back to every device and surface at the next session start. Only the middle step uses an LLM.

Capture is the episodic trace. When a session ends, each device records a small, factual breadcrumb of what happened and pushes it to a shared, encrypted repository. No model runs here. It's as cheap and automatic as laying down a memory of your afternoon.

Consolidate is sleep. A single routine, running in the cloud once for my whole account, wakes on a schedule, reads the raw breadcrumbs from every device at once, and distills them into the durable artifacts, namely daily digests, declarative memories, procedural skills, and the proposals that need my approval. Running it centrally is the whole point, because synthesis across devices only works if one process sees everything.

Propagate is recall made automatic. Each device pulls the consolidated output, the accepted memories drop into the place the assistant already looks, and pending proposals surface at the next session start for a quick accept, skip, or edit. I switch machines and the assistant simply starts already knowing what it learned somewhere else.

There's a discipline that falls out of this. The assistant reads the distilled outputs, never the raw breadcrumbs. The transcripts are the consolidator's input, not a search corpus to grep when memory comes up short. The journal is meant to be a source of truth for long horizon work without quietly fencing in what the assistant is willing to explore. The gist is kept close, and the verbatim is allowed to fade.

What it is, honestly

This is a personal system, not a product. It assumes you're comfortable with git, it leans on a scheduled cloud routine as its one expensive step, and the entire data repo is encrypted precisely because it is a detailed record of how I work. That record never leaves my own machines and my own private repo in readable form. The proposal queue exists because an automated system that rewrites its own instructions is exactly the kind of thing that should make you nervous. The answer was to keep a human in the loop for the changes that carry reach, and to automate only the ones that don't.

Two layers of memory over six weeks. The journal's distilled memory climbs from 4 to 122 and keeps accelerating, while local auto memory sawtooths, building on one machine then resetting at each switch. An inset shows a one-time correction becoming a standing rule the next day. Six weeks on three machines, drawn from the actual git history and the on disk memories. The journal's distilled store climbs from 4 to 122 and keeps accelerating, while local auto memory sawtooths, building up on one machine then dropping to nothing each time I switch devices, because none of it crosses over. A correction made once becomes a standing rule the next day. Two layers, and only one of them travels with me.

Tokens the journal loads into each session, one dot per project, plotted against how many memories it has accumulated. Most projects load under 3k tokens, one heavy project loads 47k every session, and a couple of bloated indexes load a lot from very few memories. An inset shows the loaded memory is billed at roughly a tenth of list price because it is read from the prompt cache. The same accumulation carries a price. Each dot is a project, the memory the journal loads into every session set against how much it has learned. Most stay under 3k tokens, one heavy project loads 47k every time, and a few bloated indexes lift off on only a handful of memories. In practice a typical session spends only about 17 to 30 cents carrying all of this, because nearly all of it is read from the prompt cache at roughly a tenth of list price, while the heavy work is batched into the nightly consolidator.

The result is quiet and, after a few weeks, slightly uncanny. I open a project I haven't touched in a fortnight, on a machine I didn't use last time, and the assistant already knows the constraints I would otherwise be explaining from scratch, and reaches for the methods we worked out together. It remembers. Not everything, but the parts worth remembering, which, when I think about it, is the only kind of memory that was ever useful in the first place.

Reference, the technical details

For the curious, here's the machinery. The tooling is open source at claude-journal-tools, and I would love help making it better, so issues, ideas, and pull requests are all genuinely welcome.

Capture

When a Claude Code session ends, a Stop hook on the device writes two files into raw/<device>/<date>/. The first is a structural breadcrumb, facts only and no prose.

{
  "session_id": "…",
  "device": "laptop",
  "project": "-home-me-someproject",
  "started_at": "2026-06-08T09:14:32Z",
  "ended_at":   "2026-06-08T09:51:07Z",
  "files_touched": ["src/auth.ts", "src/session.ts"],
  "skills_invoked": ["debugging", "code-review"],
  "first_prompt": "the login flow drops the session on refresh"
}

The second is a copy of the conversation, truncated to roughly 30 KB from the tail, split into user and assistant turns. The breadcrumb says what happened, while the transcript tail is what the consolidator reads to recover preferences, decisions, and feedback. Both are committed and pushed. That is the entire cost on the device side, two files and a push, with no summarization and no network dependency at the moment a session closes.

The encrypted spine

Everything lives in one private git repository, encrypted at rest with git-crypt. That includes both the raw inputs and every distilled output. Git gives me versioning, sync through tools I already trust, and no server to run. Encryption is not negotiable, because the repo is by design a thorough record of how and what I work on.

The four tracks

The consolidator is a Claude Code routine, a scheduled cloud agent created with /schedule and set up once per account. Each run clones the repo, unlocks it with a git-crypt key supplied through the environment (never written to disk in cleartext, never logged), reads the breadcrumbs for the target window, and writes four kinds of output.

Digests. A short summary, per device and per day, of what was worked on.
Memories. Declarative facts about me, the project, or external systems, applied automatically. The one exception is inferred behavioral feedback, which is routed to the proposal queue instead.
Skill proposals. A reusable technique, proposed only when it recurs across at least two sessions on at least two days, with its provenance attached.
CLAUDE.md edits. Proposed changes to a project's instruction file when my actual behavior has drifted from what the file claims.

Anything that changes behavior, meaning feedback rules, skills, and edits to instruction files, is a proposal I resolve with /journal accept, /journal skip, or /journal edit. Plain facts apply on their own. A distilled memory is just markdown with frontmatter.

---
name: terse-responses
description: How the user wants answers formatted
type: feedback
---

Prefer terse, direct answers; skip preamble and restated questions.

**Why:** said multiple times that long lead-ins waste their reading time.
**How to apply:** lead with the answer; add detail only if asked.

Cadence and idempotency

Originally the routine ran once a night and assumed the day to process was "yesterday, in UTC," appending new proposals to a file for that day. To make distilled output reach my other devices faster, I wanted to run it several times a day, and that broke both assumptions.

First, what each run processes changed from "yesterday" to a rolling window of yesterday plus today. A late run still finishes yesterday, while intraday runs refresh today as breadcrumbs accumulate. (An explicit force-date override remains for reprocessing a specific day.)

Second, appending isn't idempotent. Run twice on one day and you duplicate every proposal. So every write became an upsert keyed by its source, a feedback proposal by its originating session, a skill by its name, a digest by its date and device. Running again refreshes the existing entry in place rather than stacking a copy beside it, which is, pleasingly, just reconsolidation, where a recalled memory becomes editable and is stored again rather than duplicated. A run that changes nothing commits nothing, so frequent runs don't litter the history.

The platform sets the bounds. Routines can't run more often than hourly, and there's a daily cap per account (15, on my tier). The scheduler asks how many runs a day I want, keeps it under those limits, and emits a cron expression. Every six hours, for example, looks like this.

30 */6 * * *   # 00:30, 06:30, 12:30, 18:30 local, four runs a day

At least one run is guaranteed to fall after UTC midnight, so yesterday always gets finished. Once a day stays the default, because for most projects it's the right answer, and the rest is there for the ones that move fast across machines.