Wiki Agent Instructions
This is a personal LLM-maintained knowledge wiki for me (the user). I will load in source files I want synthesized into the raw/ directory. Obsidian is the reading interface for the wiki.
You (the agent) are the writer and maintainer of the wiki/ directory. Your job is to read my sources, synthesize knowledge, and keep the wiki current, cross-referenced, and free of staleness.
Architecture
1. Instruction layer
CLAUDE.md— User-owned instructions. NEVER modify.syntax.md— User-owned for reference. Ignore, NEVER modify.
2. Raw sources layer
raw/— User-owned immutable source documents. NEVER modify.raw/backlog/— Staging area for source files not yet ready for ingest. Agent must NEVER read or reference files here.
3.1. Wiki navigation layer
3.2. Wiki content layer
wiki/— LLM-generated wiki. You own and maintain all subdirectories as well.- Subdirectories:
wiki/sources/,wiki/concepts/,wiki/entities/,wiki/comparisons/,wiki/syntheses/,wiki/queries/. - When creating a file in
wiki/, make sure to use the corresponding template from thetemplates/directory.
- Subdirectories:
4. Implementations layer
implementations/— User-authored builds and reimplementations (e.g. buildGPT, micrograd). NEVER modify. These are exercises, not reference material.- Wiki pages may link to implementations where relevant (e.g.
**Implementation:** [[buildGPT]]).
Template files
templates/— User-defined templates. NEVER modify.- Always use the corresponding template file when creating any new file in
wiki/. - Templates are the authoritative source for page structure. Key templates:
templates/wiki-content.md— for all content pages (sources, concepts, entities, comparisons, syntheses, queries)templates/hub-*.md— for hub pages
- If a template exists for the page type you’re creating, follow it exactly.
Images
- All source images live in
raw/assets/. Never modify or move them. - When a wiki page references an image, use:
![[raw/assets/image-name.png]] - If a source contains important diagrams or figures, mention them in the source summary and link them from relevant concept pages.
Directory structure
CLAUDE.md # this file
syntax.md # user reference file
raw/ # subfolders: articles, transcripts, papers, repos, etc
articles-and-blogposts/
video-and-podcast-transcripts/
academic-papers/
code-repositories/
data/
jupyter-notebooks/
assets/ # Downloaded images from clipped articles
wiki/
index.md # Level 1 index. Catalogs broad topics and links to `hub-*.md` files.
hub-*.md # Level 2 topic-specific index. Each file catalogs ALL wiki pages of that topic.
log.md # Chronological activity record. Append-only operation history
sources/ # 1 summary file per 1 or many of the source files in raw/
concepts/ # default route. 1 file per fleshed-out concept of interest
entities/ # 1 file per "entity" in raw (publication, person, company, blog author)
comparisons/ # 1 file per meaningful comparison (e.g. of concepts)
syntheses/ # clean, high-value, one-page summaries
queries/ # for difficult questions asked by me
implementations/ # user-authored builds — agent reads, never modifies build-gpt/ build-micrograd/
templates/ # template files for agent-use
hub-*.md
wiki-content.mdFile Naming conventions
Use kebab-case.md for all file names. Examples for each file type:
- Hub page:
wiki/hub-ai-ml.md,wiki/hub-business.md,wiki/hub-engineering.md,wiki/hub-physics.md - Source summary page:
wiki/sources/src-attention-is-all-you-need.md,wiki/sources/src-blogpost-title.md- For series/multi-part sources, use:
src-creator-series-chN.md(e.g.src-3b1b-neural-networks-ch1.md). - The full original title goes in the YAML
title:field, not the filename.
- For series/multi-part sources, use:
- Concept page:
wiki/concepts/transformer-architecture.md - Entity page:
wiki/entities/andrej-karpathy.md,wiki/entities/openai.md - Comparison:
wiki/comparisons/rnn-vs-transformer.md - Synthesis:
wiki/syntheses/recap-transformers.md,wiki/syntheses/recap-rsa.md - Query:
wiki/queries/what-the-last-vector-in-context-represents.md
Content routing
Use these rules to decide where a page goes:
| Category | What goes here | Examples |
|---|---|---|
entities/ | Named things: people, companies, organisations, specific models/products, publications | andrej-karpathy.md, openai.md, gpt-4.md, nature-journal.md |
concepts/ | Ideas, techniques, mechanisms, principles — anything you’d explain rather than identify | transformer-architecture.md, attention-mechanism.md, backpropagation.md |
sources/ | One summary per source document (or small cluster of closely related sources) | src-attention-is-all-you-need.md |
comparisons/ | Head-to-head analysis of two or more concepts/entities | rnn-vs-transformer.md, adam-vs-sgd.md |
syntheses/ | Concise, high-density reference sheets for complex end-to-end processes, pulling together multiple sources and concepts | recap-modern-attention-mechanisms.md |
queries/ | Reusable answers to specific questions I’ve asked | why-layer-norm-before-attention.md |
Boundary rules:
- A specific named model (e.g. GPT-4, BERT) is an entity. The general technique it uses (e.g. transformer, masked language modelling) is a concept.
- If something is both (e.g. “attention” is a concept, “Multi-Head Attention” is a specific mechanism) — default to concept unless it’s a branded/named product.
- If a concept is small enough to be a section within another concept page (e.g. “positional encoding” within “transformer-architecture”), make it a section first. Promote it to its own page only if it grows beyond ~200 words or is referenced from 3+ other pages.
Routing qualifications:
queries/is the last resort. If a query answer could be filed as a concept, comparison, or synthesis, file it there instead. Usequeries/only when the question framing itself is the value.syntheses/pages are concise, high-density reference sheets — e.g. the full MLP forward pass in math notation, or the end-to-end flow through a GPT network in the shortest form possible. They require 3+ sources contributing meaningfully to the same narrative. A concept page that merely references multiple sources is not a synthesis. I will have significant input into how each synthesis page looks — ask me before writing one.
Scale
At the current scale (<100 sources, <200 wiki pages), the wiki/index.md → hub-*.md navigation is sufficient. The LLM reads the index, finds the relevant hub, drills into pages.
If the wiki grows beyond this:
- Add keyword search via
grep -ras a first step. - Consider adding a dedicated search tool (e.g. qmd) if grep becomes too slow or imprecise.
- Flag it to me if you notice queries taking too many steps to find relevant pages — that’s the signal to add better search.
Writing style
- Wiki pages are text-only by default. Do not embed images from
raw/assets/into wiki pages — the## Sourcessection provides a path back to original diagrams. Exceptions:- Function plots (e.g. sigmoid, ReLU) where the visual shape is the point — use LaTeX for the equation and a simple image from
raw/assets/for the plot if the equation alone isn’t sufficient. - Architecture diagrams (e.g. network topology, layer flow) where spatial structure can’t be conveyed in text — use a clean reference image in
raw/assets/. - Custom diagrams created specifically for a synthesis page, with my approval.
- Function plots (e.g. sigmoid, ReLU) where the visual shape is the point — use LaTeX for the equation and a simple image from
- Wiki concept pages can be either
.mdor.ipynb. Use.ipynbwhen the concept benefits from executable code alongside the explanation (e.g. linear algebra, probability). Use.mdfor concepts that are purely explanatory. Both formats render in Quartz and are wiki-linked identically. - Do not include an H1 heading in wiki pages. The YAML
titlefield renders as the page title via Quartz. Start page content with the**Summary:**line, then H2 sections. - When a wiki page would benefit from a diagram or plot, insert a placeholder:
> [!image] TODO: sigmoid activation plot — S-curve from 0 to 1, squashing all inputs into (0,1). Describe what the image should show. I will find or create the image and replace the placeholder. - Be concise and clear. No filler, no hedging, no “it’s worth noting that.”
- Never use vague metaphors as explanations. If you write something like “squished through a nonlinearity,” immediately follow it with what that actually means — which function, what it does to the input range, and why it’s there. The metaphor can stay as a one-line intuition, but it must be backed by the precise mechanism.
- Prioritise insight over completeness. I care more about why something matters, edge cases, and surprising implications than about restating the obvious.
- When a concept has a common misconception or subtle gotcha, call it out explicitly.
- Distinguish general principles from specific examples. If a source uses a concrete example (e.g. 784 input neurons for MNIST), make clear in the wiki page that this is an illustrative example, not a property of the concept itself. Use phrasing like “e.g. 784 for a 28×28 image” rather than stating it as a definition.
- Don’t just state formulas — explain the intuition. For any equation or algorithm, include: what each term means, why it’s there, and what would change if you removed it. E.g. for gradient descent, don’t just write the update rule — explain what the gradient vector represents, what the learning rate controls, and what happens if it’s too large or too small.
- Show both compact and expanded notation. For any matrix/vector equation, include the shorthand notation AND an expanded example showing explicit dimensions (e.g. a [3×2] matrix multiplied by a [2×1] vector → [3×1] output). This makes it easy to verify that inner dimensions cancel and outputs have the expected shape.
- I understand math notation — use it where appropriate, with colour highlighting preferred if there are many objects to keep track of (e.g. more than 4 vectors / matrices in the same equation, with several indices).
- Use 0-indexed notation (layers, neurons, vector elements, matrix entries). Layer 0 is the input layer. The first element of a vector is index 0.
- I understand algorithms and Python code — use pseudocode or real Python code freely.
Tags
Tags are for cross-cutting categorisation — things you’d want to filter by across the whole wiki. They must NOT duplicate information already captured by the filename, directory, or hub placement.
Never use:
- Topic name tags (e.g.
#attention,#gradient-descent) — the page name and hub entry already cover this. - One-off tags — if a tag would only apply to a single page, it shouldn’t be a tag. Approved tags (use only these):
| Tag | Meaning |
|---|---|
#navigation | Hub page, Index pages |
#log | Log file |
#glossary | Glossary definition page |
#synthesis | Synthesis page requiring my input |
#implementation | Page documents a hands-on build or reimplementation |
#has/code | Page contains runnable Python code |
#has/math | Page contains significant derivations or proofs |
#prerequisite | Concept that other pages depend on understanding first |
#source/paper | Content derived primarily from an academic paper |
#source/video | Content derived primarily from a video or lecture |
#source/blog | Content derived primarily from a blog post or article |
#todo/stub | Page exists but needs significant expansion |
#todo/review | Content may be inaccurate or incomplete — flagged for my review |
#todo/expand | Specific section(s) flagged for expansion — page is decent but has gaps |
#todo/create-page | Concept mentioned but lacking its own page — placeholder created |
#todo/formatting | Content is correct but layout, notation, or structure needs cleanup |
#todo/update | Page is stale — newer sources exist that should be integrated |
#contradiction | Contains unresolved conflict between sources |
Do not create new tags without my approval. If you think a new tag is needed, suggest it to me first.
Page length targets
- Source summaries (
wiki/sources/): 200–500 words. Key claims, methodology, findings. Not a rehash of the whole source. - Concept pages (
wiki/concepts/): 300–800 words. Definition, how it works, why it matters, edge cases, connections. - Entity pages (
wiki/entities/): 200–600 words. What/who it is, why it’s relevant, key contributions or characteristics. - Comparisons (
wiki/comparisons/): 300–800 words. Use tables where appropriate. Focus on when you’d pick one over the other. - Syntheses (
wiki/syntheses/): 500–1500 words. These are the most substantial pages — they pull together multiple sources and concepts. - Hub pages: No word limit, but each entry is a single line:
- [[page-name]] — one-line summary.
These are guidelines, not hard caps. A complex concept page can be 1200 words if it needs to be. But if a source summary is 900 words, it’s probably too detailed — trim it.
Workflows
Ingest
When I say “ingest <filename>”:
Stage 1 — Source summary (wait for my approval before continuing)
- Read the new source file in
raw/. - If the source contains embedded images with formulas, diagrams, or derivations, list them and tell me which ones look important. I will review and tell you which images to examine, or I will describe the key content from those images.
- Discuss key takeaways with me.
- Create a source summary page in
wiki/sources/— naming convention:src-kebab-title.md. - Stop here. Show me the source summary and wait for my feedback before proceeding to Stage 2. I may ask you to add missing derivations, expand on formulas, or correct misunderstandings from image-heavy sections.
Stage 2 — Wiki integration (after my approval of the source summary)
- For each concept, entity, or comparison the source touches:
a. Check whether a wiki page already exists (search
wiki/index.mdand relevant hubs, or usegrep). b. If the page exists, read it. Determine whether the new source confirms, extends, or contradicts the existing content.- If it confirms: no change needed, or add the new source to the Sources section.
- If it extends: update the page with the new information. Add the source reference.
- If it contradicts: add a
> [!warning] Contradictioncallout with the conflicting claim and its source. Do not silently overwrite. c. If no page exists, create one using the appropriate template.
- Update the relevant
wiki/hub-*.mdpage(s) — add new pages as line entries. If a hub doesn’t exist for this topic, create one. - Update
wiki/index.mdonly if a new hub was created. - Append an entry to
wiki/log.md:## [YYYY-MM-DD] ingest | Source title
One source typically touches 5–15 pages. That is expected and good.
Query
When I ask a question against the wiki
- Read
wiki/index.mdto find the relevanthub-*.mdfile(s). - Read those hub files to find specific pages.
- Read those specific pages. Use
grepfor keyword lookup if the wiki is large. - Synthesize an answer with
[[wiki-link]]citations (e.g.[[page-name]]) - Decide whether to file the answer as a new wiki page:
- File it if: the answer required reading 3+ wiki pages, or it produced a comparison/synthesis/analysis that doesn’t already exist in the wiki.
- Don’t file it if: the answer was a simple factual lookup, or it largely restates a single existing page.
- When filing, choose the appropriate subdirectory (
comparisons/,syntheses/,queries/) and add it to the relevant hub(s).
- Output formats: Answers don’t have to be wiki pages. Depending on the question, appropriate formats include:
- A wiki page (default for reusable knowledge).
- A comparison table (markdown table in a wiki page or standalone).
- A diagram or chart (described in markdown; I’ll render it in Obsidian or ask for a specific format).
- A slide deck (Marp format, saved in
wiki/). - Ask me if you’re unsure which format fits. Default to a wiki page.
- Append to
log.md:## [YYYY-MM-DD] query | Brief question summary.
Lint
When I say “lint”:
- Find orphan pages: pages not linked from any hub.
- Find contradictions: claims on one page that conflict with another.
- Find stubs: pages with frontmatter but no real contents.
- Find missing pages: concepts mentioned, but lacking own page.
- Check for stale claims superseded by newer sources.
- Suggest questions to investigate next, or sources to fill knowledge gaps.
- Append to
log.md:## [YYYY-MM-DD] lint | Summary of findings.
Hard rules
- Never modify
raw/files. They are immutable sources of truth, not drafts. - Never modify
templates/files. They are user-defined. - Always use a
templates/file as a basis when creating a new file(s) inwiki/ - Only create wiki content files inside these subdirectories:
wiki/sources/,wiki/concepts/,wiki/entities/,wiki/comparisons/,wiki/syntheses/, orwiki/queries/. Never create content files directly inwiki/. - Never delete log entries.
wiki/log.mdis append-only. - Keep
wiki/index.mdsmall. It links to hubs only — not to individual pages. - Frequently update the topic-specific
wiki/hub-*.mdfiles. They link to individual pages. - Split large hubs. If a hub page exceeds ~40 entries, split it into sub-hubs (e.g.
hub-ai-ml.md→hub-ai-architectures.md+hub-ai-training.md+hub-ai-applications.md). Updatewiki/index.mdto point to the new hubs. Keep the original hub as a redirect or remove it. - Flag contradictions explicitly. If a new source contradicts an existing page, note it in the page with a
> [!warning] Contradictioncallout rather than silently overwriting. - Prefer updating over creating. If a concept page already exists, update it. Don’t create a duplicate.
- Always use
[[wiki-links]]for internal linking within files. Never use[markdown links](source.md)
When in doubt
- Can’t find a relevant hub? Create a new one. Add it to
wiki/index.md. - Source doesn’t fit any existing category? Create the most natural hub for it. Ask me if genuinely unsure.
- Unsure if a concept deserves its own page? Start it as a section within the most related existing page. Promote to its own page later if it grows or gets referenced from multiple pages.
- Unsure about a factual claim in a source? Include it but add a
> [!question]callout noting the uncertainty. - Two sources say different things? Include both claims with a
> [!warning] Contradictioncallout. Don’t pick a winner silently. - A source covers a topic the wiki hasn’t touched before? Create the concept/entity pages anyway, even if they’re short. Stubs are better than gaps — they’ll get fleshed out on future ingests.