OpenAI must produce internal communications that could be infringement smoking gun

Sponsor ad - 728w x 90h (at 72 dpi)

On November 24, 2025, the US District Court for the Southern District of New York directed OpenAI to produce internal and in-house-produced communications regarding the reasoning that prompted OpenAI to delete several externally-obtained datasets from its ChatGPT training database.

The plaintiffs in the case contended that a data-set known to contain materials that were not in the public domain, called “Library Genesis” (aka Libgen) was used by OpenAI employees to produce two data sets (“Books1” and “Books2”), which in turn were used by OpenAI to train their AI models.

Sponsor ad

But those data sets were deleted by OpenAI, and when the plaintiffs asked why, OpenAI asserted attorney-client privilege, leaving in question whether OpenAI was deleting it due to “non-use,” was trying to correct the innocent mistake of ingesting the content in error, or was trying to hide a crime of copyright infringement.

Data sets used to train ChatGPT-3, including Books1 and Books2. Source: OpenAI

Judge Ona Wang said that “OpenAI has gone back-and-forth on whether ‘non-use’ as a ‘reason’ for the deletion of Books1 and Books2 is privileged at all. “OpenAI cannot state a ‘reason’ (which implies it is not privileged) and then later assert that the ‘reason’ is privileged to avoid discovery after Plaintiffs were awarded discovery on that topic.” For this and other reasons listed in her statement, the Judge declared that OpenAI had waived privilege.

December 8 deadline

Hence, the Court gave OpenAI until December 8, 2025 to produce communications that the Court reviewed privately, and all other written communications with in-house counsel in 2022 regarding the reasons for the deletion of the Books1 and Books2 datasets and all internal references to LibGen that OpenAI has redacted or withheld on the basis of attorney-client privilege.

OpenAI was further directed to log all written communications regarding the same to the extent they are not already on OpenAI’s privilege log, and to identify on its privilege log the specific communications that relate to the deletion.

OpenAI was also directed to identify the OpenAI attorneys behind OpenAI’s position that the communications were protected by attorney-client privilege by December 5, 2025, and make them available for depositions no later than December 19, 2025.

Piracy Monitor will provide updates.

Why it matters

The OpenAI internal communications may hold the key as to whether or not OpenAI knowingly violated copyright and therefore must settle the lawsuit.

Further reading

OpenAI, Inc., Copyright Infringement Litigation. Opinion & Order re: OpenAI’s deletion of Books1 and Books2 datasets and privilege rulings. Ona T. Wang, US Magistrate Judge. US District Court for the Southern District of New York

Authors Guild et al: OpenAI loses appeal over unlicensed use of fictional works by ChatGPT. Article. October 28, 2025. by Steven Hawley. Piracy Monitor

From our Sponsors