The Meta Copyright Lawsuit: Can AI Legally Feast on Pirated Books?
Meta won its AI copyright lawsuit, but judges warn training on pirated books may be unlawful. Explore the legal chaos reshaping AI’s future.
AI INSIGHT
Rice AI (Ratna)
7/14/20257 min read


The recent landmark ruling in Kadrey v. Meta has ignited fierce debate across the technology and creative industries, exposing a fundamental collision between artificial intelligence advancement and intellectual property rights. On June 25, 2025, U.S. District Judge Vince Chhabria delivered a verdict favoring Meta against authors Sarah Silverman, Ta-Nehisi Coates, and others who accused the company of illegally training its Llama AI on their copyrighted books. While the decision represents a procedural victory for AI developers, it simultaneously revealed profound ethical fissures and legal uncertainties surrounding the systematic use of pirated content for technological profit. This case illuminates the tension between two imperatives: fostering transformative innovation and safeguarding the creative ecosystems that fuel human expression in the digital age—a conflict far from resolved despite the judicial outcome.
Case Background: Deliberate Piracy and the Data Gold Rush
The lawsuit, initiated in 2023, centered on Meta’s acquisition of training data for its Llama large language models (LLMs). Internal corporate communications unsealed by court order revealed a calculated campaign to exploit pirated literary content on an industrial scale. Meta employees explicitly discussed torrenting "at least 81.7 terabytes" of data from notorious piracy hubs like Library Genesis (LibGen) and Anna’s Archive. According to Wired’s investigation, internal memos acknowledged the "medium-high legal risk" but prioritized speed, with directives urging teams to "get as much long-form writing as possible in the next 4–6 weeks... books—all genres." LibGen hosts over 7.5 million books and 81 million research papers, constituting one of the largest illicit text repositories globally (The Atlantic). Employees noted fiction was "great" for training while lamenting only having "700GB" available—a stark reduction of literature to algorithmic fodder.
When an employee expressed unease about "torrenting from a corporate laptop," the practice was escalated to CEO Mark Zuckerberg (referenced as "MZ" in internal communications) and formally approved. This top-down endorsement highlighted Meta’s institutional willingness to leverage piracy despite known legal perils. Internal discussions revealed employees speculated media exposure would "undermine our negotiating position with regulators," indicating awareness of reputational and legal vulnerabilities (Wired). Licensing was dismissed internally as "unreasonably expensive" and "incredibly slow," with procurement allegedly taking "4+ weeks to deliver data." Crucially, employees noted that licensing even "one single book" would undermine Meta’s planned "fair use strategy"—suggesting the piracy was strategic rather than incidental. This stood in stark contrast to competitors like OpenAI, which had established partnerships with publishers such as the Associated Press.
Internal assessments categorized individual books—including Beverly Cleary’s Ramona Quimby, Age 8—as having no "economic value." This corporate calculus ignored the reality that authors like Carmen Maria Machado viewed their works as "a decade of my life," with the violation feeling "so insane" when fed into machines without consequence (The Conversation). Australian author Sophie Cunningham articulated this ethical breach as Meta "treating writers with contempt." The authors argued this constituted "historically unprecedented pirating," directly infringing copyrights and threatening livelihoods by enabling AI systems to flood markets with synthetic competitors. For creators already facing precarious incomes—U.S. median author earnings hover around $20,000 annually, and Australian authors average just AUD$18,200—this represented an existential threat to cultural production itself (The Conversation).
The Legal Turning Point: Fair Use and Market Harm
Judge Chhabria’s ruling hinged on the four-factor "fair use" test under U.S. copyright law but diverged sharply from another recent AI copyright decision involving Anthropic, revealing judicial fragmentation and doctrinal uncertainty.
Transformative Use vs. Market Impact
Chhabria acknowledged that training Llama transformed books into a new AI tool—consistent with precedents like Authors Guild v. Google, where mass digitization for search qualified as fair use. However, he emphasized that transformativeness alone doesn’t guarantee legal protection. The plaintiffs’ case collapsed because they failed to demonstrate that Llama’s outputs directly competed with or devalued their specific books. As Chhabria bluntly stated in his ruling: "The plaintiffs presented no meaningful evidence on market dilution at all" (National Law Review).
Just days earlier, Judge William Alsup ruled in favor of Anthropic, declaring AI training "spectacularly transformative" and minimizing market harm concerns. Alsup controversially compared authors’ fears to complaining that teaching "schoolchildren to write" might increase competition—a comparison Chhabria later dismissed as "inapt" and legally irrelevant (Reed Smith LLP). Critically, Alsup drew a firm line against piracy itself, ruling that Anthropic’s creation of a "permanent library" of pirated books constituted infringement, with statutory damages for willful infringement still pending.
Chhabria decoupled the issue of using pirated books from the fair use analysis of training itself. While acknowledging Meta torrented books, he deferred infringement claims related to distribution via torrenting for future proceedings. This created a paradoxical outcome: the method of acquisition was legally segregated from the purpose of use. Legal analysts at Reed Smith LLP note this contrasts with the Anthropic ruling, where piracy itself triggered liability regardless of training’s transformative nature.
Judicial Skepticism and Ethical Reservations
Despite ruling for Meta, Chhabria expressed profound misgivings about the broader implications. He dismissed as "nonsense" Meta’s claim that compensating copyright holders would cripple AI development, noting AI products could generate "billions, even trillions of dollars" (Ars Technica). If copyrighted works were essential, he insisted, companies could "figure out a way to compensate copyright holders." Chhabria warned that unchecked AI training could "dramatically undermine the incentive for human beings to create things the old-fashioned way" by flooding markets with synthetic alternatives. This foreshadowed potential cultural homogenization and lost literary diversity as human creativity becomes economically unviable. The judge explicitly confined his ruling to these 13 plaintiffs, stressing it "does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful" (Bloomberg Law). This invited future lawsuits with stronger evidence of market harm, particularly targeting output regurgitation or indirect substitution.
Global Implications: Beyond U.S. Courtrooms
The Kadrey ruling reverberates across international legal and creative landscapes, exposing divergent approaches to AI governance. The Australian Society of Authors (ASA) has demanded government intervention requiring AI companies to obtain permission before using copyrighted work and provide fair compensation. They also seek clear labeling of AI-generated content and transparency about training data sources (The Conversation). This reflects global creator concerns that U.S.-style "fair use" exceptions could undermine copyright frameworks elsewhere.
The EU’s 2024 AI Act imposes transparency obligations on training data but avoids copyright prohibitions. While requiring disclosure of copyrighted material usage, it stops short of mandating licensing—a compromise criticized by creators as insufficient. U.S. Vice President JD Vance’s rejection of "excessive regulation" as "authoritarian censorship" underscores the transatlantic regulatory divide (Social Media Today). Publishers like HarperCollins have initiated opt-in licensing deals for AI training, offering authors $2,500 for nonfiction backlist titles (split 50/50 with publishers). The Authors Guild argues this undervalues creative labor, advocating for 75% author shares. Platforms like "Created by Humans" aim to certify human-authored works and facilitate ethical AI training agreements—though uptake remains limited.
Unresolved Legal and Technical Quagmires
The ruling leaves critical questions unanswered, ensuring continued litigation and industry uncertainty. Even if training qualifies as fair use, AI companies face unresolved liability when models generate infringing outputs. This is particularly acute for music (melodies, lyrics) and journalism, where AI summaries may substitute for original reporting. The New York Times v. OpenAI/Microsoft will test this frontier, focusing on verbatim output reproduction displacing news subscriptions (Complex Discovery).
Studies reveal Llama 3.1 memorized 42% of Harry Potter and the Philosopher’s Stone, enabling verbatim reproduction—a vulnerability Chhabria’s ruling didn’t address (Wired). Technical "mitigations" like fine-tuning models to refuse reproduction requests remain unproven against sophisticated prompt engineering. LibGen continues operating despite a $30 million U.S. court judgment against it in 2024, highlighting enforcement challenges. Investigative reports note generative AI companies now absorb pirated content into profitable products that compete with originals—escalating ethical concerns beyond mere access (The Atlantic). Non-commercial AI development faces existential risks if stringent licensing requirements emerge. Library associations warn restricting training to public domain works would skew models away from contemporary human expression, privileging corporate entities with licensing resources (Association of Research Libraries).
Pathways Toward Ethical Coexistence
Sustainable solutions must bridge technological potential and creative integrity. Industry-led mechanisms could mirror music streaming royalties, distributing compensation based on usage metrics. The ASA advocates for statutory licensing frameworks requiring AI companies to contribute to creator funds—similar to Australia’s educational copyright schemes (The Conversation). Embedding "do not train" metadata in digital works, coupled with legislation recognizing such tags, could balance creator autonomy with AI development. Enhanced regurgitation filters and "weighted forgetting" techniques might reduce infringement risks, though their legal weight remains untested.
Courts must distinguish healthy competition from destructive substitution. Chhabria outlined three proof paths for future plaintiffs: Evidence of output regurgitation (e.g., AI reproducing verbatim text); Harm to emerging licensing markets for AI training data; and Demonstrable "indirect substitution" (AI outputs displacing originals). He deemed fiction authors potentially less vulnerable than nonfiction writers, whose functional content faces higher substitution risks (Ars Technica).
Conclusion: Between Innovation and Exploitation
Judge Chhabria’s ruling is less a vindication of AI practices than a procedural indictment of legal strategy. While affirming that transformative use of copyrighted works for training can qualify as fair use when plaintiffs fail to prove specific market harm, it offers no ethical absolution for relying on pirated content. The decision’s explicit narrowness—confined to 13 authors without broader precedent—combined with judicial warnings about "market dilution," ensures the core conflict will escalate toward appellate courts and likely the Supreme Court.
The path forward demands nuanced solutions recognizing both AI’s societal value and creative labor’s non-fungibility. Blanket assertions that training is always fair use or always infringement ignore the technology’s contextual complexity. As these legal battles unfold, the fundamental question persists: Can we harness AI’s transformative potential without cannibalizing the human creativity that nourishes it? The Meta case answers nothing definitively but guarantees the trial of our technological ethics will intensify—with the soul of cultural production hanging in the balance.
References
The Guardian. "Meta wins AI copyright lawsuit as US judge rules against authors."
https://www.theguardian.com/technology/2025/jun/26/meta-wins-ai-copyright-lawsuit-as-us-judge-rules-against-authorsThe Atlantic. "The Unbelievable Scale of AI’s Pirated-Books Problem."
https://www.theatlantic.com/technology/archive/2025/03/libgen-meta-openai/682093/Reed Smith LLP. "A New Look at Fair Use: Anthropic, Meta, and Copyright in AI Training."
https://www.reedsmith.com/en/perspectives/2025/07/a-new-look-fair-use-anthropic-meta-copyright-ai-trainingThe Conversation. "Meta allegedly used pirated books to train AI. Australian authors have objected, but US courts may decide if this is fair use."
https://theconversation.com/meta-allegedly-used-pirated-books-to-train-ai-australian-authors-have-objected-but-us-courts-may-decide-if-this-is-fair-use-253105Wired. "Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal."
https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/Social Media Today. "Meta Wins Case Over Its Use of Copyright-Protected Content To Train AI."
https://www.socialmediatoday.com/news/meta-wins-legal-case-use-copyright-protected-work-ai-training/752009/National Law Review. "Meta's AI Copyright Victory: What It Means for the Future of AI Training."
https://natlawreview.com/article/metas-ai-copyright-victory-what-it-means-future-ai-trainingBloomberg Law. "Meta Beats Authors’ Copyright Suit Over AI Training on Books."
https://news.bloomberglaw.com/litigation/meta-beats-copyright-suit-from-authors-over-ai-training-on-booksArs Technica. "Book authors made the wrong arguments in Meta AI training case, judge says."
https://arstechnica.com/tech-policy/2025/06/judge-dismisses-authors-claims-that-meta-illegally-used-books-to-train-ai/Association of Research Libraries. "Training Generative AI Models on Copyrighted Works Is Fair Use."
https://www.arl.org/blog/training-generative-ai-models-on-copyrighted-works-is-fair-use/
#AICopyright #GenerativeAI #FairUse #TechEthics #CopyrightLaw #AIEthics #AuthorsRights #MetaLawsuit #DigitalTransformation #InnovationVsCopyright #DailyAIInsight
RICE AI Consulting
To be the most trusted partner in digital transformation and AI innovation, helping organizations grow sustainably and create a better future.
Connect with us
Email: consultant@riceai.net
+62 851-1760-1680 (Marketing)
© 2025. All rights reserved.


+62 851-1748-1134 (Office)
IG: @rice.aiconsulting