Overview

An LLM wiki is a knowledge base where a language model constructs and maintains all structured documentation from raw source material. Introduced by Andrej Karpathy, the pattern uses three layers: an immutable raw/ directory of source documents, a wiki/ directory of LLM-generated markdown pages, and a schema file (CLAUDE.md) that encodes structural rules and naming conventions. Three core operations drive the system: ingest (process new sources into the wiki), query (ask questions navigated via an index file), and lint (health checks for contradictions, orphan pages, and stale claims). The individual curates inputs and manages quality decisions; the model handles all organisational administration.

The approach is explicitly positioned as a lightweight alternative to second brain / Zettelkasten methods for the 50–200 source-document scale, where knowledge compounding across ingests is more valuable than raw retrieval speed.

Three-layer architecture

Three operations

Ingest

Process new files placed in raw/: summarise into wiki/sources/, propagate insights to concept and entity pages, refresh index.md, append to log.md. A single ingest can modify dozens of wiki pages as implications cascade across the knowledge network.

Query

Navigate via index.md, read targeted pages, synthesise an answer with [[wiki-link]] citations. Particularly insightful responses can be saved as new wiki pages — knowledge accumulates rather than evaporating.

Lint

Periodic health check: scan for contradictions between pages, orphan pages (no incoming links), missing concepts (referenced but no page exists), stale claims superseded by newer sources, and investigation gaps. Analogous to eslint for documentation.

LLM wiki vs RAG

The LLM wiki is a stateful alternative to retrieval-augmented generation for individual/small-team scale:

Dimension RAG LLM wiki
State Stateless per query Stateful — knowledge compounds
Infrastructure Vector DB + embedding pipeline Folder of .md files
Cross-references Discovered ad-hoc Pre-built by the LLM
Contradictions Undetected Flagged during lint
Scale sweet spot Enterprise (millions of docs) Personal/team (<200 docs)
Traceability Chunk-level (often lossy) Source-level citations to raw/

At small scale the LLM wiki wins because queries are cheap (index + targeted pages), every claim traces back to a source, and contradictions surface automatically. RAG wins when the corpus is too large to pre-integrate or requires sub-second latency at enterprise scale.

Intellectual lineage

The pattern completes Vannevar Bush’s 1945 vision of the Memex — an associative machine that correlates a person’s entire knowledge corpus. The Memex remained impractical because every link had to be created manually. The LLM wiki makes the cost of maintenance near zero by having the model generate and update all links during each ingest operation.

Karpathy frames this as the third step in a progression:

  1. Vibe Coding (2025) — trust LLM-generated code
  2. Agentic Engineering (2026) — humans supervise agents, not code
  3. LLM Knowledge Bases (2026) — machines manage information, humans curate

Criticisms

Karpathy’s own framing

Karpathy describes the working setup as: “Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.” The human browses in real time; the LLM makes edits.

The human’s job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM’s job is everything else.

Key use cases from the gist:

Tooling notes (from the gist)

Resources