OpenAI Investigates DeepSeek for Potential Data Misuse

OpenAI is investigating whether DeepSeek, a Chinese AI startup, improperly used its data to train its AI models, sparking concerns about IP theft.

Feb 11, 2025

OpenAI Investigates DeepSeek for Potential Data Misuse

OpenAI is investigating whether DeepSeek, a Chinese AI startup, may have improperly used data from OpenAI's models to develop its own AI technologies. The investigation was prompted after DeepSeek unveiled a cost-effective large language model (LLM) that rivaled existing market offerings.

OpenAI suspects DeepSeek may have used a technique called "distillation," which involves training an LLM using data produced by another LLM. OpenAI's terms of service reportedly restrict this practice.

An OpenAI spokesperson stated that the company is "aware of and are examining signs that DeepSeek may have improperly distilled our models and will provide updates as we gather more information". OpenAI is taking "strong, proactive measures to safeguard our technology" and will collaborate with the U.S. government to protect its models.

Microsoft, a major investor in OpenAI, also detected unusual data extraction activity linked to DeepSeek. Security researchers observed individuals believed to be connected to DeepSeek transferring large volumes of data through OpenAI's application programming interface (API).

DeepSeek has not yet responded to requests for comment. The company's recent release of its R1 reasoning model has challenged industry norms, delivering competitive performance at a lower cost.

The investigation has raised concerns about data security and intellectual property protection. David Sacks, an AI advisor, believes there is "substantial evidence" that DeepSeek engaged in distillation from OpenAI.

This situation has also drawn the attention of U.S. officials. The Italian regulator has requested that DeepSeek disclose what personal data they collect, the sources of this information, and the legal basis for its processing.