DeepSeek’s New AI Model May Use Google Gemini Data

June 4, 2025

54

Chinese AI lab DeepSeek recently launched an updated version of its R1 reasoning model. This new DeepSeek model performs strongly on several math and coding benchmarks. However, DeepSeek has not revealed where it sourced the training data. Some AI experts speculate that DeepSeek may have used data from Google’s Gemini family of AI models.

Sam Paech, a developer based in Melbourne who studies AI emotional intelligence, shared evidence suggesting DeepSeek’s latest model, R1-0528, shows a strong preference for words and expressions similar to those favored by Google’s Gemini 2.5 Pro. Paech posted his findings on X, highlighting these striking similarities between DeepSeek’s output and Gemini’s.

Earlier this year, OpenAI told the Financial Times it detected signs linking DeepSeek to the practice of distillation. Distillation involves training smaller AI models using outputs from larger, more capable ones. According to Bloomberg, Microsoft noticed large amounts of data moving through OpenAI developer accounts late last year. OpenAI suspects these accounts are linked to DeepSeek.

Although distillation is a common AI training method, OpenAI’s terms prohibit using its model outputs to build competing systems. Yet, distinguishing AI-generated data from human-written content remains challenging. The web now contains many AI-generated articles and posts, often called “AI slop.” Content farms and bots flood platforms like Reddit and X with synthetic text. This contamination makes it difficult to filter out AI-generated data during model training.

Despite this, some experts believe DeepSeek might have relied heavily on Gemini-generated content. Nathan Lambert, a researcher at nonprofit AI2, commented on X. He said, “If I were DeepSeek, I would generate massive synthetic data from the best API model available.” Lambert noted that DeepSeek has money but fewer GPUs, making this strategy smart.

To counter distillation risks, AI companies have increased security measures. In April, OpenAI began requiring ID verification for accessing some advanced models. The process demands a government-issued ID from certain supported countries. Notably, China, where DeepSeek is based, is excluded from this list.

Google also recently started “summarizing” data traces from models on its AI Studio platform. This step complicates efforts to train rival models on Gemini outputs. Likewise, in May, Anthropic began summarizing its own model traces to protect its competitive edge.

We have reached out to Google for a comment and will update this story if we receive a response.

For more tech updates, Visit DC Brief.

DeepSeek’s New AI Model May Use Google Gemini Data

Notification Permissions Automatically Managed in Chrome

TAG Heuer Smartwatch Made for iPhone with Enhanced Features

Foldable iPhone Design: Apple’s New Material Innovation

Most Popular

Shutdown Crisis Deepens, U.S. Businesses Warn of Major Economic Fallout

Financing Rates Surge Opportunity for U.S. Homebuyers

U.S. Stocks Plunge as Trump Escalates Trade War with China

Bomb Factory Explosion in Tennessee Kills 16 with No Survivors

EDITOR PICKS

Trump’s Swift Executive Actions Reshape Washington and Stir Political Debate

Zohran Mamdani Controversy Ignites Backlash Over New York Times Report

Mass Shooting Rocks North Carolina Party: One Dead, Over a Dozen Wounded

POPULAR POSTS

Medicaid Provider Tax Rate Changes Rejected by Senate Rules in Major GOP Setback

Tragic Date Turns Deadly: Single Dad Killed After Meeting Woman on Instagram

Trump Administration Appointees Gather for Historic White House Celebration

POPULAR CATEGORY

Follow Us