• Pascal's Chatbot Q&As
  • Posts
  • Defendants allegedly built (or powered) commercial generative-3D systems using 3D assets harvested via large research datasets—especially Objaverse-XL and a derivative subset called TRELLIS-500K...

Defendants allegedly built (or powered) commercial generative-3D systems using 3D assets harvested via large research datasets—especially Objaverse-XL and a derivative subset called TRELLIS-500K...

..while stripping the machine-readable copyright management information (CMI) that made those assets legally re-usable under Creative Commons terms in the first place.

The Attribution Laundering Theory: How 3D Creators Are Using DMCA §1202 to Fight AI Training at Scale

by ChatGPT-5.2

Across four near-parallel class action complaints filed by 3D artist Austin Beaulier against NVIDIA, Meta, Microsoft, and Roblox, the core grievance is strikingly consistent: the defendants allegedly built (or powered) commercial generative-3D systems using 3D assets harvested via large research datasets—especially Objaverse-XL and a derivative subset called TRELLIS-500K—while stripping the machine-readable copyright management information (CMI) that made those assets legally re-usable under Creative Commons terms in the first place.

This is not framed as a front-door “AI training is infringement” case. It is framed as a CMI case—a claim that the defendants’ pipelines removed (or failed to preserve) attribution and license metadata, and then used/distributed the resulting CMI-stripped works, in violation of the DMCA.

1) What the lawsuits say the defendants did (the grievance, in one pipeline)

The complaints describe a repeatable “industrial” pattern:

  1. Creators upload 3D models to repositories (e.g., Sketchfab, Thingiverse, Polycam, GitHub), selecting Creative Commons licenses that commonly require attribution and may restrict commercial use or impose share-alike conditions.

  2. Dataset curators compile these into large training datasets, especially Objaverse-XL (said to exceed 10 million assets) and TRELLIS-500K (said to be ~500,000 assets derived from Objaverse-XL).

  3. Commercial AI developers use these datasets as a map to locate and download the underlying 3D works.

  4. During conversion/rendering/normalization (and related preprocessing), the “creator-identifying information, licensing metadata, and other CMI” is allegedly removed or not preserved, yielding “CMI-stripped representations.”

  5. Those stripped representations are then used inside monetized generative-AI systems—systems that can generate 3D meshes/assets through APIs and are integrated into commercial ecosystems.

The defendant-specific “hooks” are then used to make that generalized pipeline feel concrete:

  • Microsoft is portrayed as both builder and distributor in the chain: it allegedly trained TRELLIS (and later versions) on Objaverse-derived assets, and also publicly released TRELLIS-500K without per-file license information, allegedly reducing downstream visibility into license conditions and attribution obligations.

  • NVIDIA is portrayed as a major commercial user of Objaverse-derived data and is tied in particular to TRELLIS-500K-derived training inputs.

  • Meta is tied to its generative-3D efforts (the complaint describes a 3D system trained on Objaverse-XL-derived data) and alleged integration into broader AR/VR and creator ecosystems.

  • Roblox is tied to its generative-3D tooling for its developer ecosystem (including an API integration), and is alleged to have trained on a large dataset including Objaverse-XL-derived materials.

In short: the lawsuits claim the defendants industrialized community-shared assets into commercial 3D generation, while severing the machine-readable attribution/licensing thread that made that community sharing viable.

The legal strategy here is sophisticated and, frankly, revealing about where plaintiffs think leverage exists in AI litigation.

Instead of betting everything on contested questions like “is training fair use?” the complaints emphasize 17 U.S.C. §1202(b)—the DMCA provisions that prohibit:

  • §1202(b)(1): intentionally removing or altering CMI; and

  • §1202(b)(3): distributing/using works knowing CMI has been removed/altered, with the required mental state.

The complaints repeatedly assert that the information attached to the models—creator identity, title, and licensing terms—qualifies as “CMI” under §1202(c), and that preprocessing pipelines predictably separate content from metadata such that attribution and license info do not survive into training corpora. The key “why it matters” assertion is that stripping CMI makes it impossible for downstream users (and sometimes even the AI developer’s own product layer) to identify the creator or comply with license conditions—especially attribution and non-commercial restrictions. One complaint makes this point bluntly: removal of CMI allegedly “ensured that neither the resulting AI systems nor their users could identify the creators … or comply with the license conditions.”

That is the hidden engine of these cases: CMI stripping is described as the mechanism that converts “licensed sharing” into “untraceable extraction.” If a court accepts that framing, §1202 becomes a powerful way to police “license laundering” at scale—even where direct infringement claims are complicated.

3) The most surprising, controversial, and valuable statements

Surprising (strategically): “This is not about challenging AI research.”
The complaints go out of their way to say the case isn’t a referendum on generative AI itself; it’s about commercial deployment without honoring the license conditions that came with the works. That rhetorical move is not just PR. It’s a litigation posture: it tries to make the court comfortable granting relief without feeling like it is “banning AI.”

Controversial (legally): treating preprocessing as “CMI removal.”
A central pressure point is whether converting a 3D model into rendered views, normalized meshes, or other ML-ready representations that no longer carry the original metadata is legally equivalent to removing or altering CMI—especially if the metadata is not literally deleted from the original file but simply not carried forward. Defendants will argue “we didn’t remove CMI; we transformed data for training and the metadata wasn’t part of the training tensor.” Plaintiffs are trying to turn that engineering reality into DMCA liability.

Valuable (for other rights owners): “Each work is a separate violation.”
The complaints allege that each work from which CMI was removed/altered constitutes a separate §1202(b)(1) violation, and they seek statutory damages and/or actual damages and profits. If that framing survives early motions, it creates settlement pressure: §1202 can scale damages across a dataset in a way that traditional infringement theories sometimes struggle to do cleanly.

Valuable (fact pattern): “Objaverse-XL preserves links back to creators.”
The complaints repeatedly emphasize that Objaverse-XL preserves references to source repositories and creator accounts, making the artists “not anonymous or unknowable,” and making attribution feasible—at least at dataset level. This is important because it attacks the common practical excuse in AI training: “we can’t attribute at scale.” Plaintiffs are effectively saying: the dataset already did the hard part; you chose to sever it.

4) Is the evidence any good?

At the pleading stage, the evidence posture is plausibility-driven rather than “smoking gun”:

What looks strong:

  • Public admissions / public documentation: The complaints lean on the defendants’ public disclosures about using Objaverse-derived data or releasing/using TRELLIS-500K-type resources and training pipelines. That’s a credible way to plead “access” and “use,” and it avoids the pure-speculation vibe that hurts some training-data cases.

  • Technical inevitability argument: The pleadings make a commonsense claim: ML preprocessing pipelines commonly discard or fail to carry forward human-readable attribution/licensing metadata into model-ready representations. Even if that’s not “proof,” it is plausible—and plausibility is the currency of surviving a motion to dismiss.

  • Traceability: The emphasis that Objaverse-XL retains links to source and creators sets up a strong “feasibility of compliance” narrative: defendants can’t credibly say attribution compliance was impossible if they started from a dataset that preserved provenance.

What looks vulnerable:

  • Mens rea (“knew or had reasonable grounds to know”): §1202 claims often rise or fall on whether plaintiffs can show the defendant knew removal would “induce, enable, facilitate, or conceal infringement,” and knew (or had reason to know) the CMI was removed without authority. The complaints assert this, but much of it is inferential. Defendants will push hard on the idea that loss of metadata in a preprocessing pipeline is not the same as intentionally removing CMI with the statutorily required knowledge.

  • Who removed what: Another likely defense move is to point upstream: dataset curators, repository export formats, or intermediate tooling may have omitted metadata, and defendants may argue they merely used what they received. Plaintiffs respond by describing the defendant’s own pipeline steps (copying, converting, rendering, normalizing) as the point of removal/failure-to-preserve—but proving that later is nontrivial.

  • CMI scope fights: Defendants may contest whether certain license/attribution fields qualify as “CMI” in the relevant statutory sense, especially if they are not embedded in the work itself but appear as adjacent platform metadata. Plaintiffs have tried to squarely plead that creator identity and license terms accompanying the works are CMI, but this will be a battlefield.

Net: the evidence is “good enough to sue,” but not yet “good enough to win.”These cases are designed to reach discovery, where the real proof would be: logs of ingestion, dataset manifests, preprocessing code, metadata retention choices, and internal discussions about licensing/attribution risk.

5) Lessons for other creators and rights owners

  1. Stop thinking the only fight is “training = infringement.”
    These complaints demonstrate a different axis of control: provenance and attribution integrity. §1202 claims can become a pressure tool even when fair-use questions remain unsettled.

  2. Treat machine-readable attribution and license data as enforcement infrastructure, not etiquette.
    If your licensing regime relies on attribution or non-commercial limitations, ensure those signals exist in forms that can be proven, tracked, and audited (platform metadata, embedded metadata, consistent identifiers). The litigation theory here assumes that “CMI exists,” “it was removed,” and “its removal mattered.”

  3. Target the “dataset-to-product” chain, not just the model maker.
    Roblox is sued here not because it’s the canonical “AI lab,” but because it allegedly deployed a generative system into a monetized developer ecosystem. The commercial integration point—APIs, developer platforms, enterprise tools—is where courts may be more receptive to harm and remedies.

  4. Ask for remedies that force accounting, not just money.
    A notable request across these complaints is an order requiring defendants to identify and account for works incorporated into datasets/training pipelines and to implement compliance measures. That’s strategically important: it pushes toward transparency, not just damages.

  5. Class framing is not just scale—it’s feasibility.
    The complaints lean on the argument that individual creators can’t practically litigate one-by-one. If courts accept that, §1202 class actions become a repeatable enforcement template for other creator communities whose works are systematically ingested.

6) Predictions: what outcomes are most plausible?

Most likely near-term path: motions to dismiss + narrowing.
Expect defendants to attack: (i) whether the pleaded facts plausibly show intentionalCMI removal; (ii) whether the metadata qualifies as CMI; (iii) whether the “induce/enable/facilitate/conceal” knowledge element is adequately pleaded; and (iv) whether the plaintiff can tie specific works to the defendant’s pipeline without speculation. Some claims may be narrowed, but I would not be surprised if at least part of the §1202 theory survives—especially where the complaints rely on public disclosures about datasets and processing steps.

Medium term: discovery pressure → compliance-style settlement.
If any defendant faces meaningful discovery into ingestion logs, preprocessing pipelines, and internal licensing discussions, settlement incentives rise. The most realistic settlement shape is not “delete all models,” but: (a) monetary fund; (b) forward-looking CMI preservation / provenance measures; (c) dataset governance commitments; (d) possibly a creator opt-out/registry or attribution mechanism where feasible.

Longer term: a judicial split on whether “failure to preserve metadata” is “removal.”
The big doctrinal question is whether courts treat ML preprocessing that discards metadata as actionable “removal/alteration,” or as a non-culpable byproduct of transformation. Different judges could land differently, producing inconsistent district-court outcomes until an appellate court clarifies the boundary.

Most consequential potential outcome (if plaintiffs win a key ruling): §1202 becomes the scalable enforcement layer for AI provenance.
If a court credibly endorses (even partially) the idea that stripping license/attribution metadata in training pipelines violates §1202, the implications go beyond 3D assets. It would encourage rights owners to build machine-readable rights signals and then litigate removal as the core wrong—turning provenance into a legally enforceable technical requirement rather than a voluntary norm.