• Pascal's Chatbot Q&As
  • Posts
  • AI litigation: Judge Eumi Lee’s questions function like a checklist of what class actions live or die on: a workable class definition, an administrable method to identify class members and...

AI litigation: Judge Eumi Lee’s questions function like a checklist of what class actions live or die on: a workable class definition, an administrable method to identify class members and...

...covered works, credible theories of common proof, and procedural discipline. The difference between “this feels wrong” and “this is certifiable” is preparation.

“The Recipe Problem”: How to Turn AI-Training Grievances into a Class Action That Survives the First Judge

by ChatGPT-5.2

The Courthouse News Service report captures a familiar collision: creators and publishers alleging that a platform-scale AI system was trained on copyrighted works without meaningful consent, while the defendant argues (1) licensing language already permits the use, (2) individualized issues overwhelm common ones, and (3) nobody can even prove which works were used. In the hearing described, Judge Eumi Lee’s questions function like a checklist of what class actions live or die on: a workable class definition, an administrable method to identify class members and covered works, credible theories of common proof, and procedural discipline. The difference between “this feels wrong” and “this is certifiable” is preparation.

Below are the lessons for authors and publishers who want to initiate or support similar litigation, and then a “perfect” to-do list—built directly around the judge’s concerns and the defendant’s pressure points.

Lessons for authors and publishers considering similar litigation

1) Your class definition is not a vibe—it’s a machine-readable specification

The judge pressed for an “operative class definition” and what exactly she’d be certifying. That is the whole ballgame. If your definition boils down to “anyone whose works were used,” you’ve already triggered the defendant’s best response: overbroad, unascertainable, impossible to adjudicate.

Lesson: Draft the class like an engineer: objective criteria, time bounds, product bounds, and a feasible identification method. If you can’t express it cleanly, you can’t certify it.

2) “Google doesn’t keep lists” is either a gift or a trap—treat it as both

Google’s attorney argued they don’t keep lists of individual works used to train Gemini; plaintiffs’ counsel countered that this “doesn’t make sense” because improving models implies tracking inputs (“recipe/cake” analogy).

Lesson: You need a litigation-ready theory of how training data provenance must exist (or can be reconstructed), plus discovery plans to prove it. But you also need a fallback if the defendant truly lacks itemized records: alternative proof methods (dataset snapshots, crawler logs, storage buckets, hash matching, sampling, third-party data brokers, vendor traces, watermarking/near-duplicate detection, etc.). If you bet everything on “they must have a recipe list,” the defendant will try to win by saying “we don’t.”

3) If you seek injunctive relief, define the operational remedy—not just the moral one

Plaintiffs sought injunctive relief: transparency and consent before use. Courts often like injunctions more than sprawling damages theories—but only if the remedy is specific, enforceable, and technically coherent.

Lesson: Don’t demand “stop using copyrighted works” in the abstract. Specify the compliance mechanism: consent signals, opt-out registries, dataset quarantining, model re-training rules, documentation obligations, audit rights, and notice requirements. A judge will implicitly ask: How would I supervise this?

4) Defendant licensing language must be attacked surgically, not rhetorically

Google leaned on broad license terms for uploaded content, arguing it can “develop and improve existing services” and that the language is clear—even if sweeping.

Lesson: Plaintiffs and supporting publishers must map license text to specific uses and argue why training a generative model is not covered (or why the license is unconscionable, misleading, ultra vires to user expectations, or conflicts with copyright). If you don’t dismantle the license pathway, the case collapses into individualized fights about who agreed to what.

5) Named plaintiffs must be immaculate on standing, ownership, and chain-of-title

Google “poked holes” in individual claims: publication before registration, third-party licenses (e.g., Mattel), alleged lack of copyright in a cover image, “work for hire,” and mischaracterized ownership.

Lesson: The fastest way to kill a class case is to discredit your representatives. Plaintiffs must be the cleanest examples, with documentary proof that survives deposition and public scrutiny. Publishers can help here by supplying chain-of-title rigor and rights-management evidence.

6) Intervention strategy must be timed like litigation chess, not industry theater

Two publishers sought to intervene; the judge questioned why they waited, and Google argued prejudice because discovery might need reopening. Plaintiffs argued publishers weren’t necessary and could delay proceedings.

Lesson: Publishers should decide early whether they are:

  • Class members only (quiet support, declarations, expert resources),

  • Intervenors (active parties with their own interests), or

  • Parallel plaintiffs (separate but coordinated litigation).

Late intervention can look opportunistic and procedurally disruptive—even if substantively important.

7) Procedural discipline is not optional—judges will sanction sloppiness

The report notes a sanctions motion accusing plaintiffs of expanding class scope without meeting and conferring, contrary to the judge’s standing orders. The judge was frank: even if meet-and-confer seems futile, that’s not an excuse.

Lesson: In class actions, procedural missteps become credibility problems. Follow standing orders obsessively. Treat meet-and-confer as a strategic tool and a litigation hygiene requirement.

8) Publishers add leverage, but also complexity—use them to strengthen common proof

Publishers argued they have a “larger stake” and “unique reproduction rights,” and could bring expert analysis. Plaintiffs argued authors already represented copyright owners and publishers might delay.

Lesson: Publishers can materially improve the case by:

  • producing cleaner rights and registration evidence,

  • providing scale and damages modeling expertise,

  • supplying technical experts and dataset tracing capability,

  • demonstrating market harm and licensing norms.

But publishers must do this in a way that reduces individualized issues rather than inflaming them.

9) “Millions of claims” is not a feature—unless you can manage it

The plaintiffs sought a class potentially including millions of claims. That scale is persuasive politically but dangerous legally.

Lesson: The bigger the class, the more you must show:

  • commonality (shared factual and legal questions),

  • typicality (representatives are not outliers),

  • adequacy (counsel and reps can represent everyone),

  • ascertainability/admin feasibility (who’s in and how we know).

If you can’t administer it, shrink it—then expand later if warranted.

The perfect to-do list for preparing class action cases, according to ChatGPT (and built from the judge’s questions and comments)

A. Class design that a judge can certify without guessing

  1. Draft 2–3 alternative class definitions (narrow, medium, broad), each with:

    • clear time period(s),

    • clear product/model scope,

    • clear ingestion pathway(s) (web crawling vs user uploads vs third-party datasets),

    • clear inclusion/exclusion rules (e.g., works licensed, public domain, opt-in partners).

  2. Write an “administrability memo”: step-by-step how class membership and covered works will be determined.

  3. Pre-build a class notice and claims process concept that is feasible at scale (even if damages are not the immediate goal).

B. Proof of use: solve the “recipe problem” before the defense weaponizes it

  1. Define the “proof of training use” standard you will ask the court to accept (direct logs, dataset lists, probabilistic matching, sampling + inference).

  2. Prepare a provenance discovery plan (requests + deposition targets) aimed at:

    • crawler logs and crawl policies,

    • dataset assembly pipelines,

    • storage locations and dataset versions,

    • governance documents and data sourcing approvals,

    • vendor/partner dataset contracts,

    • model cards / internal documentation / evaluation sets,

    • red-team outputs and memorization tests.

  3. Develop a reconstruction fallback if itemized records are absent:

    • third-party dataset identification,

    • hashing/near-duplicate detection against known corpora,

    • statistical sampling protocols,

    • controlled prompt-output tests to show memorization/derivation patterns (with strict methodological discipline).

  4. Retain technical experts early—not just for trial, but to shape discovery and class feasibility arguments.

C. Injunction engineering: make the remedy enforceable, not aspirational

  1. Draft a proposed injunction in operational terms, including:

    • transparency obligations (what must be disclosed, to whom, and when),

    • consent/opt-out mechanisms,

    • dataset quarantine and deletion workflows,

    • training-data governance controls and audits,

    • model update obligations and compliance reporting.

  2. Stress-test enforceability: if the judge asked “How do we know what works are covered?”, your remedy must answer that question.

  1. Collect every relevant license and policy (platform TOS, upload licenses, API terms, developer agreements, product terms) and build:

  • a timeline of term changes,

  • a comparison matrix across products,

  • an argument map: which language covers what exact activity, and why training is different (or not).

  1. Build “reasonable expectations” evidence (UI design, disclosures, consent flows, dark patterns, consumer understanding) to counter “it’s clear” defenses.

  2. Segment plaintiffs/class members by consent posture:

  • never uploaded,

  • uploaded unknowingly (e.g., auto-indexed or mirrored),

  • uploaded knowingly but without training consent,

  • uploaded under commercial licenses with limitations.

E. Plaintiff selection: make representatives deposition-proof

  1. Run a chain-of-title audit for every named plaintiff and key class exemplars:

  • registrations, assignments, publication dates,

  • works-for-hire status,

  • prior exclusive licenses (e.g., toy companies, studios, publishers),

  • cover art ownership vs interior content,

  • co-authors/co-illustrators and split rights.

  1. Create a “standing binder” per plaintiff: everything needed to defeat ownership/registration attacks in one package.

  2. Pre-mortem deposition vulnerabilities: simulate the defense cross-exam and patch inconsistencies before filing.

F. Publisher role: decide early whether to intervene, support, or parallel-file

  1. Choose the publisher posture and stick to it:

  • early intervention if unique rights/relief are critical,

  • amicus/support declarations if speed and cohesion matter,

  • separate action if the publisher theory diverges (e.g., database rights, contract, unfair competition).

  1. If intervening, justify timing with a concrete reason that a judge will respect (not “we were waiting” but “new procedural posture/decision point makes our participation necessary now”).

  2. Ensure publisher participation reduces individualized issues (standard-form contracts, representative catalogs, clean rights evidence).

G. Rule compliance and litigation hygiene: eliminate self-inflicted wounds

  1. Follow the judge’s standing orders like code:

  • meet-and-confer requirements,

  • motion practice rules,

  • discovery sequencing,

  • page limits, exhibit rules, sealing protocols.

  1. Institutionalize meet-and-confer:

  • documented agendas,

  • written proposals,

  • dispute narrowing,

  • a clean record showing diligence (it matters when sanctions are threatened).

  1. Scope control discipline: if you expand the class or claims, do it through agreed process or clearly justified motion practice—not surprise.

H. Narrative and strategy: show the court why class treatment is necessary

  1. Write the “why class” story in judicial language:

  • common conduct by defendant,

  • common legal questions,

  • efficiency and consistency,

  • impossibility of individual suits at scale.

  1. Anticipate the defense “individualized issues” attack and neutralize it with:

  • subclassing,

  • sampling plans,

  • common proof frameworks,

  • standardized damages/injury theories (or focus on injunctive relief first).

  1. Prepare a phased plan (injunction first, damages later) so the judge can grant something workable without committing to chaos.

Closing thought

The hearing described is a reminder that class actions against AI developers will be won or lost less on abstract debates about innovation and more on administrability: Can a court define the class, identify the works, test the consent theory, and craft enforceable relief without drowning in individualized fact patterns? The “perfect” preparation is therefore not maximalist—it’s disciplined: narrow where needed, forensic on ownership and licenses, technically literate on data provenance, and procedurally impeccable.