Nvidia, the tech giant known for its graphics processing units, is facing a proposed class-action lawsuit from a group of authors over allegations that their copyrighted works were used without permission to train the company’s artificial intelligence platform, NeMo.
Copyrighted Books Used to Train NeMo Language Model
The lawsuit, filed on Friday in a federal court in San Francisco, claims that Nvidia “admitted” to training its NeMo language model on a dataset containing 196,640 books, including novels and other works by the plaintiffs in this case—Brian Keene, Abdi Nazemian, and Stewart O’Nan.
According to the complaint, this dataset, known as “The Pile” and containing a collection called “Books3,” was used to help NeMo simulate natural written language before being removed in October 2023 due to reported copyright infringement concerns.
Intellectual Property Rights Infringement
The authors allege that by using their copyrighted materials to train its AI system, Nvidia has infringed on their intellectual property rights. They are seeking unspecified damages for any U.S.-based creator whose work was used to develop NeMo’s large language models over the past three years.
NeMo is Nvidia’s platform for deploying generative AI capabilities, which the company promotes as a fast and cost-effective solution for adopting the technology. However, the use of copyrighted training data raises legal questions around fair use and intellectual property protections.
Recent AI Copyright Infringement Lawsuits
The case mirrors similar lawsuits brought against other tech firms over AI copyright issues, including actions taken by writers and news organizations like The New York Times against OpenAI and Microsoft, creators of the ChatGPT language model.
Nvidia now finds itself embroiled in a growing debate over the boundaries of fair use as AI systems become increasingly reliant on ingesting and learning from vast troves of copyrighted digital content.