- Pascal's Chatbot Q&As
- Posts
- His goal wasn’t a flashy chatbot. It was infrastructure: a system that knows its sources, ranks them by legal quality, and won’t confuse commentary with binding law.
His goal wasn’t a flashy chatbot. It was infrastructure: a system that knows its sources, ranks them by legal quality, and won’t confuse commentary with binding law.
He takes inspiration from Andrej Karpathy’s “feed raw documents → compile into a linked wiki” concept, but adapts it for law using Claude Code and an Obsidian-based knowledge graph.
Scaffolding Before Answers: The “Second Brain” That Makes AI Worth Trusting
by ChatGPT-5.2
Oliver Belitz did something deceptively simple—and quietly profound. Over a weekend, he spent roughly 2.5 million tokens (about €60) building a “second brain” for the EU AI Act: a structured, linked knowledge base that an LLM can navigate cleanly and consistently, instead of “shotgun” retrieving snippets the way many RAG systems do. His goal wasn’t a flashy chatbot. It was infrastructure: a system that knows its sources, ranks them by legal quality, and won’t confuse commentary with binding law—exactly the failure mode that makes AI risky in compliance-heavy domains.
Belitz explicitly frames this as a verdict against naive RAG for complex, cross-cutting questions: pre-compiled, curated structure can outperform ad-hoc retrieval when the user’s question spans multiple legal layers. He takes inspiration from Andrej Karpathy’s “feed raw documents → compile into a linked wiki” concept, but adapts it for law using Claude Code and an Obsidian-based knowledge graph: “no vector database, no RAG pipeline”—instead, a maintained markdown corpus where the agent can reason across stable pages and explicit cross-links.
What Belitz built that’s different from “just another knowledge base”
His refinements matter because they target the exact points where AI-assisted legal/compliance work fails:
Source hierarchy (not flat storage): He separates sources into ranked categories by legal quality—because a Delegated Act and a LinkedIn comment are not peers.
Collision logic: When sources conflict, the system flags the tension rather than “resolving” it with a fabricated certainty. That is a big shift: uncertainty belongs in the answer.
Statutory grounding requirement: For concrete legal questions, the answer must anchor in the statutory text itself; wiki pages are navigation, not authority.
Anti-anchoring rule: His own prior publications are ingested last and treated as historical data points—reducing the risk that the model simply revalidates his previous conclusions.
Semantic drift control: He acknowledges a core problem of living knowledge graphs: pages can slowly diverge from the underlying sources as edits accumulate—so he designs periodic audits beyond a basic “lint” check.
Put plainly: Belitz is building an AI system that behaves less like an eager intern and more like a disciplined research function—hierarchical, cite-first, and uncertainty-aware.
Why Oliver Schmidt-Prietz’s comments matter
Schmidt-Prietz’s comments are important because they validate the approach whilenaming the hidden engineering realities—what makes this workable, and what makes it dangerous if you treat it like magic.
Token burn is not waste; it’s scaffolding.
He stresses that the expensive upfront ingestion feels inefficient, but you’re paying for structure, not content. That framing matters for teams trying to justify cost: the ROI comes later, as reuse compounds.The wiki will drift unless you police it.
He’s blunt: drift is real, pages diverge quietly as the graph grows, and periodic audits are necessary. He explicitly calls “lint” a starting point, not the solution. This is the difference between a trustworthy system and a slowly rotting one.Two vaults, not one: avoid poisoning your compliance brain.
His “regulatory vault vs general vault” point is one of those operational insights that should be a default pattern. Mixing citation-strict compliance material with looser cross-domain notes degrades both. If your “truth vault” gets contaminated by convenience content, the system’s outputs become un-auditable at exactly the moment you need auditability.The compounding effect is the actual prize.
He describes the virtuous loop: every new source touches 5–15 existing pages; cross-references that used to cost hours happen automatically. That’s not just speed—it’s a new kind of coverage, where edge cases and tensions surface earlier.
His concrete example is telling: he asked whether special-category data can be processed to test an AI system for bias, and the wiki pulled the relevant AI Act provision, cross-referenced a GDPR basis, surfaced a key EDPB opinion, and highlighted the unresolved legal collision. That is exactly what professionals want: not a confident answer, but the correct map of authorities and the live fault-line between them.
Why this effort matters (beyond legal tech)
This “second brain” approach is a preview of what serious agentic systems will require in any high-stakes domain: medicine, finance, engineering, safety, corporate policy, even scientific publishing.
RAG is often treated as a universal fix: “just retrieve the right chunks.” But Belitz and Schmidt-Prietz are describing a different thesis:
When the domain has hierarchies of authority, cross-document dependencies, and meaningful contradictions, retrieval alone isn’t enough.
You need pre-compiled structure that enforces discipline: what counts as primary vs secondary, what is binding vs persuasive, what is current vs historical, what conflicts and why.
In other words: they’re building the missing layer between raw documents and fluent AI—a governance-grade epistemic scaffold.
Caveats and failure modes (the price of being serious)
This approach isn’t “free truth.” It swaps one set of risks for another.
Upfront cost and ongoing maintenance: You pay tokens to build and tokens forever to keep it clean. A neglected wiki becomes a confident liar with prettier citations.
Semantic drift and update debt: Drift is a known failure mode; the bigger the graph, the more you need audit rhythms, versioning, and change-control.
Garbage-in hierarchy: A source ranking system is only as good as its design. If teams encode the wrong hierarchy (or institutional bias), the model will faithfully reproduce that bias at scale.
False sense of completeness: A beautifully linked wiki can create “map ≈ territory” overconfidence—especially when the user forgets what isn’t ingested yet (new case law, new guidance, new delegated acts).
Confidentiality and scope control: The “two vaults” warning generalizes: mixing confidential material with general corpora can create accidental leakage pathways, even if you never intended it.
Not a substitute for judgment: Collision logic can surface uncertainty; it cannot decide policy, risk appetite, or enforcement posture. Humans still own accountability.
Advantages for AI developers
Higher factuality under constraint: Less hallucination-by-context-sampling than “RAG shotgun,” because the system reasons over stabilized pages and explicit links.
Auditability by design: Source hierarchy + statutory grounding makes it easier to show why an answer was produced and which authority it relied on.
Better conflict handling: Collision logic prevents forced certainty and makes “known unknowns” first-class output.
Improved evaluation and debugging: When answers are produced from a maintained graph, failures can be traced to specific pages/edges, not a probabilistic retrieval accident.
Cheaper long-run iteration: Token-heavy ingestion can reduce repeated retrieval and repeated re-reading across sessions, especially for recurring questions.
Composable agent workflows: A clean markdown wiki becomes a stable interface that multiple agents/tools can share, instead of each tool building its own hidden context.
Safer personalization: The anti-anchoring rule is a practical mitigation against “model agrees with me because I wrote the corpus.”
Advantages for AI users (especially professionals)
Answers that respect authority levels: The system is explicitly designed not to treat an FAQ like binding law.
Faster cross-referencing without losing context: Users get integrated reasoning across legal levels and documents, not a pile of excerpts.
Transparent uncertainty instead of confident mush: Collision marking helps users see where the hard questions really are.
Consistency over time: A curated knowledge base reduces the “same question, different answer” randomness that undermines trust.
Better learning and onboarding: A graph of concepts + sources becomes a training system for humans, not just a tool for answers.
Reduced reputational and compliance risk: Statutory grounding makes it harder to operationalize an answer that can’t be defended.
Real productivity gains in the right places: The compounding effect—new sources updating many pages—turns maintenance into leveraged work rather than repeated manual re-analysis.
Closing thought: “Fat skills” need “fat foundations”
Belitz’s project is a reminder that the agent era won’t be won by prompting cleverness alone. It will be won by infrastructure that makes knowledge legible, ranked, conflict-aware, and auditable. Schmidt-Prietz’s comments matter because they drag the idea out of hype and into operations: budget the tokens, expect drift, separate vaults, and treat compounding as the payoff.
If you’re building AI for domains where truth has a hierarchy—and consequences—this is the direction of travel: not “more context,” but better epistemology.
