Publishers Sue Meta Over AI Book Piracy, Alleging Zuckerberg Personally Authorized It

Five major publishing companies and bestselling novelist Scott Turow filed a federal class action lawsuit Tuesday against Meta Platforms Inc. and its chief executive Mark Zuckerberg, accusing the technology giant of knowingly downloading approximately 81.7 terabytes of pirated books and scholarly texts from shadow library websites to train its Llama artificial intelligence models — and alleging that Zuckerberg personally authorized the decision to do so.

The lawsuit, filed May 5 in the U.S. District Court for the Southern District of New York, names Elsevier Inc., Cengage Learning Inc., Hachette Book Group Inc., Macmillan Publishing Group LLC, and McGraw Hill LLC as plaintiffs. Their catalogs collectively span academic journals, college textbooks, and mass-market literary titles that together represent a major share of English-language publishing output. The case is a putative class action, meaning the plaintiffs seek to represent all authors and publishers whose works were included in Meta’s Llama training data without authorization — a class that could ultimately number in the millions of individual titles.

Scott Turow — author of legal thrillers including Presumed Innocent and Reversible Errors, and a former president of the Authors Guild — joined the suit as a named plaintiff. Turow has been among the publishing world’s most prominent advocates on the financial threat that AI systems pose to working writers and the publishing industry.

What the Complaint Alleges

The complaint describes a deliberate, executive-directed course of mass piracy. According to the filing, Meta employees used BitTorrent to download approximately 81.7 terabytes of copyrighted material from two websites that operate as unauthorized repositories of published works: Library Genesis, widely known as LibGen, and Anna’s Archive.

LibGen is one of the internet’s largest shadow libraries, hosting millions of academic papers, textbooks, reference works, and trade books — materials typically scanned or converted from legally purchased copies and distributed without the knowledge or consent of publishers or authors. Founded around 2008, the site has grown to index millions of titles spanning virtually every academic discipline and literary genre. Anna’s Archive, a more recently established platform, functions as a search engine and aggregator that mirrors LibGen’s holdings alongside similar piracy repositories and provides unified access to their combined catalogs.

Neither site operates with permission from the copyright holders whose works they host. Both have been subjects of prior legal action. Elsevier — one of the plaintiffs in the current lawsuit — won a $15 million default judgment against LibGen operators in 2015. That enforcement history makes the complaint’s allegations about Meta’s use of the same sites particularly striking: the company is accused of adopting a piracy pipeline whose operators had already been found liable to one of its own plaintiffs more than a decade earlier.

The plaintiff publishers collectively produce a significant portion of the world’s academic and educational publishing. Elsevier alone publishes thousands of peer-reviewed journals. Cengage produces textbooks used across American higher education. Hachette, Macmillan, and McGraw Hill together account for a large share of both trade and educational publishing. The complaint alleges that Meta’s unauthorized copying of their catalogs gave Llama a training advantage built directly on the publishers’ intellectual property without any compensation.

Zuckerberg’s Role in the Decision

The complaint’s most consequential allegations concern how Meta made the choice to use these piracy sites in the first place. According to the lawsuit, Meta initially explored licensing agreements with publishers to obtain training data at scale. The company’s internal discussions included proposals to substantially increase its dataset licensing budget. But that effort was derailed, the complaint alleges, when the question of whether to license or pirate was escalated directly to Zuckerberg.

At that point, according to the filing, Meta’s business development team received verbal instructions to stop pursuing licensing negotiations. Zuckerberg personally signed off on using LibGen for Llama’s training data, the complaint states, even though internal Meta AI executives had explicitly flagged it as “a dataset we know to be pirated.”

Naming a company’s chief executive as a personal defendant in commercial copyright litigation is relatively uncommon. Plaintiffs in corporate intellectual property cases typically name the company itself and pursue individual defendants only when there is evidence of direct personal involvement in the alleged wrongdoing. By naming Zuckerberg individually alongside Meta, the publishers are advancing the theory that the decision to pirate was not a routine operational choice made below the executive level — it was a specific directive from the top, taken with full knowledge of the legal status of the source material.

That framing, if accepted by the court, could affect both the scope of Meta’s liability and the calculation of any damages award. Willful infringement — infringement that occurs despite actual knowledge that the conduct is unlawful — can trigger enhanced statutory damages under the Copyright Act.

The lawsuit arrives as Meta has been restructuring aggressively around artificial intelligence, having conducted its third major workforce reduction in three years. Those cuts fell heavily on advertising and sales roles that Meta’s AI tools are now expected to handle — a transformation that makes the Llama AI platform central to the company’s long-term business case.

Meta’s Defense and the Fair Use Question

Meta responded with a statement that has become a routine opening position in AI copyright litigation. “Courts have rightly found that training AI on copyrighted material can qualify as fair use,” the company said, adding that it intends to “fight this lawsuit aggressively.”

The fair use argument has had qualified success in court, but it carries a critical limitation that the publishers’ complaint is designed to exploit. In a ruling in a separate copyright case involving Anthropic — the AI company behind the Claude assistant — a federal judge held that training large language models on copyrighted material could qualify as fair use under certain conditions. But the same ruling found that the prior act of obtaining those materials through piracy was not protected by fair use. A company’s transformative use of content in an AI training process does not retroactively legitimize the method by which that content was originally acquired.

That distinction is the core of the publishers’ legal strategy in the Meta case. They are not simply challenging the fact that Llama was trained on their works. They are targeting the specific acts — the BitTorrent downloads from LibGen and Anna’s Archive — that preceded training. Even if Meta could eventually prevail on fair use as to the training itself, it would still face liability for the piracy that made the training possible.

Meta’s Llama models have been released publicly as open-source software and are now used by millions of developers around the world.

A New Front in AI’s Copyright Wars

The Meta lawsuit represents the latest escalation in what has become a sustained legal campaign against the major AI developers over their use of copyrighted material in training data. Tracking the rapidly expanding litigation across the science and technology sector has become a full-time occupation for intellectual property attorneys at several major law firms.

The legal landscape shifted significantly with OpenAI’s abrupt shutdown of its Sora video platform last month, which ended a $1 billion partnership with Disney and signaled how content-rights disputes can force rapid and costly pivots in AI companies’ strategies. OpenAI has also faced copyright claims from authors, news organizations, and other content creators across multiple cases.

Universal Music Publishing Group, Concord Music Group, and ABKCO Music filed a $3.1 billion lawsuit against Anthropic in January 2026, alleging that the company built its Claude AI on a foundation of unlicensed song lyrics. The wave of litigation reflects a broader industry reckoning: AI companies built their early models on datasets scraped from the internet and pirated from shadow libraries at a time when the legal status of that activity was genuinely unsettled. Courts are now beginning to draw lines, and those lines are not uniformly favorable to the AI side.

For the publishers, the Meta case offers something that many prior AI copyright suits have lacked: documentary evidence of executive-level knowledge and authorization. Most AI copyright complaints have been built on inference — arguing that companies must have known their training data included pirated material. The allegations in the Meta complaint, as described in the lawsuit, include internal corporate communications explicitly acknowledging that LibGen was a piracy site and a documented decision to proceed despite that knowledge.

What Comes Next

The litigation will advance through the federal civil docket on a timeline that is not fast. Motions to dismiss, briefing on class certification, and summary judgment proceedings are all expected before any trial date could be set. Attorneys following the cluster of AI copyright cases have projected that courts are unlikely to issue final rulings on the underlying fair use questions before late 2026 at the earliest.

If the class is certified, the suit could expand to represent a potentially enormous number of authors and publishers. Llama’s training datasets have not been fully disclosed publicly, but prior research and reporting have established that they drew heavily on pirated archives including LibGen and similar sources. The number of individual titles at issue could run into the millions.

The injunctive relief sought by the plaintiffs may be as significant as the damages they are pursuing. The publishers are asking the court to order Meta to destroy all infringing copies of their works in Meta’s possession or control. Whether that obligation would reach already-distributed Llama models — and whether courts would treat a trained AI model as an “infringing copy” of the books used to train it — are unsettled questions in AI copyright law that no court has yet conclusively answered. A ruling on that issue, whenever it comes, could have industry-wide consequences that extend far beyond Meta alone.

Meta has not yet filed a formal response in court. A deadline for that filing will be set by the presiding judge in the Southern District of New York.