• Pascal's Chatbot Q&As
  • Posts
  • This report moves the conversation away from “Is the model good?” and toward “Can a reviewing body reconstruct what happened in this case?” In administrative law, that question is decisive.

This report moves the conversation away from “Is the model good?” and toward “Can a reviewing body reconstruct what happened in this case?” In administrative law, that question is decisive.

AI can make public administration look modern, efficient, and even accurate, while quietly making it legally unreviewable.


When AI Makes the Record Unprovable: The New Procedural Crisis in Administrative Law

by ChatGPT-5.2

Daniel Verloop’s The Substantiation Gap: AI-Mediated Record Formation Under Administrative Law is one of the most important procedural-law interventions in the current AI debate—not because it argues that AI is inaccurate, biased, or unlawful in the abstract, but because it identifies a more foundational failure mode: an authority can reach a substantively correct decision and still fail in court because it cannot prove how the record was formed.

That shift in focus is the paper’s real contribution. It moves the conversation away from “Is the model good?” and toward “Can a reviewing body reconstruct what happened in this case?” In administrative law, that question is decisive.

The paper’s thesis is sharp, conditional, and highly actionable: where AI materially shapes a contestable administrative act, and the authority cannot produce case-bound evidence of what was retrieved, what configuration governed the interaction, and what was adopted into the file, the authority cannot satisfy substantiation duties under existing law. In short: contestation presupposes provenance.

What the paper is really about

The paper is grounded in Dutch administrative law (Awb), but its logic is intentionally broader. It argues that many administrative systems already assume three structural commitments:

  • decisions must be reasoned,

  • relevant case materials must be producible,

  • courts (or reviewing bodies) must be able to test whether required procedures were followed.

AI-mediated retrieval, summarisation, and drafting can break that structure if the mediation layer is not preserved as case-bound evidence.

The author calls this the “substantiation gap”: a structural inability to demonstrate lawful procedure in the individual case, even when outputs look plausible and a human reviewed them.

That distinction matters enormously. The paper is not primarily about AI hallucinations or unfairness. It is about evidentiary architecture.

The key challenges and concerns

1) AI changes what is “available for consideration”

The paper’s core insight is that AI-mediated retrieval alters who (or what) determines the informational universe seen by the official.

Traditionally, an official could testify to what they considered. With AI-assisted retrieval/ranking/summarisation, the official sees a filtered and framed output, but often cannot attest to:

  • what the system retrieved but did not show,

  • what was omitted in summarisation,

  • what ranking/framing choices shaped salience,

  • which settings or prompts governed that interaction.

This is a profound procedural shift because availability itself becomes an external step that must be evidenced, not merely assumed.

2) Human review does not solve the problem

A recurring institutional defense—“a human reviewed and approved it”—is treated rigorously and, in my view, correctly dismantled.

The paper shows that attestation (“I reviewed it”) is not equivalent to evidence of procedure. Human review may establish formal responsibility, but it cannot reconstruct missing logs, missing configuration snapshots, or missing adoption traces after the fact.

That is a major challenge for public-sector AI deployments worldwide because many current governance programs over-index on “human in the loop” language while under-investing in provable procedural traceability.

3) Existing governance tools often inspect the system, not the case

One of the paper’s strongest concerns is that DPIAs, impact assessments, and compliance certifications are frequently system-level instruments. They may show that risks were assessed in general. They usually do not show:

  • what was retrieved in case X,

  • what settings applied in case X,

  • what the official changed or adopted in case X.

In the paper’s memorable framing: these instruments may inspect the pipes, but they do not test the water.

This has global significance because many regulators and public bodies currently treat compliance frameworks as proxies for legal defensibility in individual proceedings. The paper argues that this proxy often fails.

The paper repeatedly emphasizes that the gap is often a procurement failure, not just a vendor limitation.

If an authority procures a system for decision-grounding use without contractual rights and operational capability to export case-bound provenance, it has effectively foreclosed its own substantiation capacity before any dispute even arises.

This is a serious warning to governments worldwide: AI contracting is no longer only about data protection/security clauses; it is also about future courtroom survivability.

5) Retention mismatch is a hidden systemic risk

Many AI logs are retained according to operational/security/debugging policy, not legal contestation timelines. The paper highlights the resulting mismatch:

  • by the time an objection/appeal is filed,

  • the relevant interaction evidence may no longer exist.

This creates a particularly damaging failure mode: evidence not merely unavailable, but destroyed by design/default retention.

6) The “black box” problem is being reframed correctly

A highly useful move in the paper is distinguishing interpretability from reconstructibility.

The author does not require public authorities to explain model internals or embedding mathematics. Instead, the paper says administrative review often only needs a much more practical evidentiary package:

  • what was queried,

  • what was surfaced,

  • under what configuration,

  • what was adopted.

That is a more realistic and legally operable standard—and one that can likely be implemented far sooner than deep model interpretability.

The most surprising statements and findings

1) A decision can be procedurally invalid even if substantively correct

This is the paper’s most striking and consequential finding.

The argument is not “AI makes wrong decisions.”
The argument is: even a correct outcome may be annulled if the authority cannot substantiate the procedure that produced the contested record.

That is both surprising and deeply plausible under administrative law logic.

2) The problem exists from deployment day one, not from first litigation

Another powerful finding: the gap is not created when a claimant challenges a decision. It exists the moment a DGU-relevant system is deployed without provenance infrastructure. Litigation merely reveals it.

This is important because many institutions still treat provenance/logging as something to improve later “once adoption matures.” The paper implies that in contestable administrative contexts, that sequencing is backwards.

3) “Proportionality” does not excuse absence of minimum evidence

The paper makes a binary claim that will likely be debated: proportionality can affect scrutiny and remedy, but it does not create an exception to the existence of a reviewable record.

In other words, there is no “proportionate absence” of core provenance. Either the minimal evidentiary substrate exists or it does not.

That is doctrinally provocative and operationally clarifying.

4) Ex post re-running the query is not a cure

Many teams assume they can recreate the interaction later. The paper carefully explains why this fails:

  • corpus/index changes,

  • permissions change,

  • configs/prompts/models change,

  • stochastic output differs.

So re-execution shows current capability, not historical procedure. That is a crucial finding for anyone relying on “we can reproduce later.”

5) The distinction between documented / contractually available / operationally accessible evidence

The paper’s “three-layer evidence test” is unexpectedly practical and one of its best contributions.

An artifact may:

  1. be documented in product materials,

  2. be contractually obtainable,

  3. be actually exported, retained, and linked to a dossier.

The paper argues only layers 2 and 3 truly matter for legal substantiation. This is an excellent diagnostic for regulators, auditors, procurement teams, and litigators.

The most controversial claims

1) The paper is doctrinally ambitious while acknowledging limited empirical testing

The author is candid that the argument is based on doctrinal and architectural reasoning, not broad empirical evaluation of actual deployments. That honesty is a strength—but also a likely point of attack.

Critics may say:

  • this overstates prevalence,

  • some systems already log enough,

  • courts may be more pragmatic than predicted.

The paper anticipates this by making the thesis conditional and falsifiable. Still, this will remain a central controversy.

2) The procedural burden is shifted firmly onto authorities, even where objectors cannot specify what is missing

The “catch-22” problem is handled in a way many public bodies may find uncomfortable: if objectors cannot know what was omitted because the system is opaque, that asymmetry cuts against the authority, not the claimant.

This is legally coherent in file-based review systems—but politically and administratively explosive.

3) Procurement choices become quasi-doctrinal fault lines

The claim that “provider limitations accepted at contracting” do not excuse non-production in proceedings is a hard-edged position. It effectively treats weak procurement terms as foreseeable self-inflicted legal risk.

That will be controversial in practice, especially in rushed public-sector AI deployments where legal, technical, and procurement functions are still siloed.

4) Courts should treat missing provenance as structural, not a tolerable irregularity

The paper argues that judicial pragmatism should have limits here, because missing provenance often prevents courts from performing the very prejudice analysis needed to forgive procedural defects.

This is a strong institutional claim, and some judges may resist it if they fear mass disruption or backlog effects.

The most valuable findings and concepts

1) Decision-Grounding Use (DGU) trigger

This is a genuinely useful concept.

The DGU trigger clarifies when AI use becomes procedurally relevant: when AI-mediated content enters the file, materially shapes the basis/reasoning, and the act is contestable (including parallel exposure through transparency requests).

This avoids both overreach (“all AI use is legally sensitive”) and underreach (“it’s just an assistant”). It is functional, not marketing-dependent.

2) Decision Provenance Bundle (DPB)

The DPB is the paper’s operational centerpiece and likely its most reusable idea globally.

Its four categories:

  • retrieval provenance,

  • configuration fixity,

  • adoption traceability,

  • retention + case-binding,

provide a concrete evidentiary minimum that bridges law, procurement, and system design.

This is exactly the sort of concept courts and regulators can work with because it translates abstract principles into inspectable artifacts.

3) Interpretability vs reconstructibility

This distinction is gold.

It prevents the debate from collapsing into impossible demands (“explain the whole model”) while preserving what legal review actually needs (“show what happened in this case”).

4) “Procedural failure without substantive error”

This is a sophisticated and globally relevant framing. It explains why institutions can appear to be functioning until challenge processes activate, and then suddenly face cascading legal vulnerability.

For public administration, that is a much more realistic risk model than “AI failed because it hallucinated.”

What this means for AI use worldwide

1) AI adoption in government will increasingly split into two tracks

We should expect a growing divide between:

  • low-risk productivity use (translation, drafting support, internal brainstorming not inserted into contestable records), and

  • decision-grounding use (retrieval/summarisation/drafting affecting reasons/files in contestable decisions).

The second category will require evidentiary infrastructure, not just policy documents.

2) “Human oversight” standards will be reinterpreted through evidentiary capability

Globally, many laws and frameworks rely on human oversight language. This paper suggests oversight should increasingly be judged by whether the institution can show:

  • what the human was given,

  • what the human changed,

  • and what evidence existed at the time.

Expect oversight debates to move from role descriptions to traceability artifacts.

3) Procurement and vendor due diligence will become litigation strategy by another name

Public-sector AI buying decisions will have to incorporate:

  • export rights,

  • logging granularity,

  • retention guarantees,

  • provenance portability,

  • wrapper-layer transparency,

  • auditability at the case level.

In other words, AI procurement teams will need to think like future appellate counsel.

4) Courts may become indirect architects of public-sector AI infrastructure

If reviewing bodies start insisting on case-bound provenance, they will effectively shape system design without writing technical standards. Judicial doctrine can drive:

  • log schema expectations,

  • retention practices,

  • evidentiary packaging norms,

  • burden allocation rules.

This is likely to happen unevenly, but once a few influential decisions land, the effect could be rapid.

5) The same logic extends beyond the Netherlands

The paper’s comparative claim is persuasive at a structural level: wherever systems require reason-giving, a producible file, and effective review, AI-mediated record formation without case-bound traces creates similar risks.

That suggests relevance not only in EU Member States but in many common law and civil law systems with administrative appeals, tribunals, ombuds, judicial review, and disclosure obligations.

6) “Sovereign AI” or open-weight deployment will not automatically solve the problem

The paper explicitly rejects the idea that the risk disappears if the model is open-weight, self-hosted, or “sovereign.” If evidence infrastructure is not built, the gap remains.

This is a crucial global lesson. Political branding around AI sovereignty should not be confused with procedural reviewability.

What we can expect as a result

If the paper’s logic gains traction, expect the following developments:

  1. More disputes framed around file production and substantiation rather than model fairness/explainability.

  2. Procurement rewrites requiring case-bound provenance export and retention.

  3. Vendor product differentiation around “court-grade provenance” features.

  4. Regulatory guidance that separates compliance telemetry from dossier-attachable evidence.

  5. Judicial checklists/tests (similar to the paper’s operational test) for determining whether AI use created a procedural verification gap.

  6. Institutional triage: agencies restricting AI use in contestable workflows until provenance capability is proven.

  7. New doctrine on remedies, especially around annulment, curing defects, and whether missing provenance can ever be “passed over” without collapsing reviewability.

Recommendations for courts

1) Treat provenance as a threshold reviewability issue

Before debating explainability, fairness, or model quality, courts should first ask:

  • Was AI use decision-grounding?

  • Can the authority produce case-bound provenance sufficient to review the contested act?

If not, courts should recognize a procedural reviewability defect.

2) Distinguish reconstructibility from interpretability

Courts should avoid requiring impossible explanations of model internals where unnecessary. The legally relevant question in many cases is whether the authority can reconstruct the informational and procedural envelope of the decision.

3) Require specificity on what was available for consideration

Authorities should not be allowed to rely on generic descriptions of AI systems where the contested issue is case-specific availability, omission, framing, or adoption.

4) Scrutinize “human reviewed” attestations as incomplete evidence

Courts should treat attestation as one artifact, not a substitute for retrieval/configuration/adoption provenance.

5) Use confidentiality mechanisms as channels, not excuses

Where provenance artifacts contain sensitive or proprietary information, courts should prefer restricted-access/confidential handling mechanisms rather than accept non-production.

6) Be cautious with procedural-defect forgiveness doctrines

Where the defect is the absence of evidence needed to assess prejudice, courts should recognize the circularity in “forgiving” the defect on the assumption that no harm occurred.

Recommendations for regulators

1) Define a minimum case-bound provenance standard for public-sector decision-grounding AI

Regulators should issue guidance (or binding rules where appropriate) that specifies minimum evidentiary categories for contestable administrative use, closely aligned with the paper’s DPB logic.

2) Separate compliance logging from legal substantiation logging

Regulatory guidance should explicitly distinguish:

  • operational/security/compliance telemetry, and

  • dossier-producible, case-bound procedural evidence.

Confusing these is one of the major governance failures identified by the paper.

3) Make procurement requirements explicit

Regulators should require public-sector procurers to address, at a minimum:

  • exportable retrieval provenance,

  • configuration snapshots,

  • adoption traceability,

  • retention through challenge periods and proceedings,

  • wrapper-layer as well as underlying-model transparency.

4) Align retention policies with legal contestation timelines

Agencies should be required to implement litigation-hold style retention rules for DGU-relevant interactions, rather than default product retention settings.

5) Mandate deployment testing against falsification-style criteria

Before rollout, regulators should encourage or require a practical test:
“Can this deployment export a complete provenance bundle for a simulated contested case under ordinary conditions?”

If not, deployment should be restricted from decision-grounding uses.

6) Build cross-functional enforcement capacity

This problem sits at the intersection of:

  • administrative law,

  • procurement,

  • records management,

  • IT architecture,

  • data protection,

  • litigation readiness.

Regulators should avoid siloed oversight and create integrated review capacity.

7) Prioritize high-contestation domains first

Benefits, permits, licensing, tax, immigration, enforcement, and other high-volume/high-impact domains should be prioritized for provenance controls before AI use scales further.

Final assessment

This paper is valuable because it identifies a risk many institutions are currently missing: AI can make public administration look modern, efficient, and even accurate, while quietly making it legally unreviewable.

Its strongest contribution is not anti-AI. It is anti-handwaving.

The author’s framework gives courts and regulators a practical way to separate:

  • AI use that is mostly productivity assistance, from

  • AI use that alters the evidentiary foundations of contestable state action.

If this argument is right—and I think much of it is—then the next major wave of AI governance will not be led by model benchmarks or ethics statements. It will be led by something more prosaic and more powerful: what can be produced in the file when a decision is challenged.