The authors claim that Meta exploited copyrighted materials for AI training despite warnings from its own lawyers
Authors claim that Meta utilized copyrighted materials for AI training, disregarding cautionary advice from its own legal counsel.

According to a recent filing in a copyright infringement case originally filed last summer, Meta Platforms (NASDAQ:META)'s counsel cautioned it about the legal risks of utilizing thousands of pirated books to train its AI models, but the business did it nonetheless.
The new filing, which was made late Monday night, combines two lawsuits filed against the Facebook and Instagram owner by comedian Sarah Silverman, Pulitzer Prize winner Michael Chabon, and other prominent authors, who claim that Meta used their works without permission to train its artificial-intelligence language model, Llama.
Last month, a California court dismissed a portion of the Silverman complaint and suggested that he would grant the writers freedom to amend their allegations.
Meta did not react quickly to a request for comment on the allegations.
The new case, filed on Monday, includes chat logs of a Meta-affiliated researcher discussing the dataset's procurement in a Discord channel, a potentially key piece of evidence indicating that Meta was aware that its use of the books might not be covered by US copyright law.
In the chat logs cited in the complaint, researcher Tim Dettmers explains his back-and-forth with Meta's legal department over whether using the book files as training data was "legally ok."
"At Facebook, there are a lot of people interested in working with (T)he (P)ile, including myself, but in its current form, we are unable to use it for legal reasons," Dettmers wrote in 2021, referring to a dataset Meta admitted to using to train its first version of Llama, according to the complaint.
Dettmers wrote a month before that Meta's lawyers had advised him that "the data cannot be used or models cannot be published if they are trained on that data," according to the complaint.
While Dettmers did not address the lawyers' concerns, his chat colleagues mention "books with active copyrights" as the most likely source of concern. They argue that data training should "fall under fair use," a legal framework in the United States that covers certain unlicensed uses of copyrighted material.
Dettmers, a doctorate student at the University of Washington, told Reuters he couldn't comment on the claims right away.
This year, tech companies have been hit with a flurry of lawsuits from content producers accusing them of stealing copyright-protected works in order to build generative AI models that have become a global sensation and sparked a frenzy of investment.
If successful, the cases might dampen the generative AI frenzy by forcing AI companies to compensate artists, authors, and other content producers for the use of their works, raising the cost of constructing the data-hungry models.
Simultaneously, new provisional rules governing artificial intelligence in Europe could oblige corporations to reveal the data they use to train their models, possibly exposing them to additional legal danger.
In February, Meta unveiled the initial version of their Llama big language model and disclosed a list of datasets used for training, which included "the Books3 section of ThePile." According to the complaint, the person who compiled the dataset stated elsewhere that it comprises 196,640 books.
The company did not release training data for its most recent version of the model, Llama 2, which was made commercially available this summer.
For businesses with fewer than 700 million monthly active users, Llama 2 is free to utilize. Its introduction was viewed as a potential game-changer in the market for generative AI software, threatening the dominance of competitors such as OpenAI and Google (NASDAQ:GOOGL), which charge for the usage of their models.
Bonus rebate to help investors grow in the trading world!