Al Kags

The New Data Deal: How ChatGPT, Claude, Gemini, Perplexity and DeepSeek Handle Your Conversations in 2026

By Al Kags

Deep Research using Gemini, Perplexity and Claude

Generative AI systems are no longer novelties. They sit in browsers at work, on mobile devices, in email clients and operating systems. Every time someone asks ChatGPT to draft a contract, or sends Claude a research memo, or lets Gemini read their messages, a quiet transaction takes place: human context in, machine capability out.

The questions this blog post asks are straightforward:

The answers show a consistent pattern: generous data rights and strict limits for paying institutions, default‑on data collection for ordinary users, and growing gaps between legal theory and practical control.

What the big systems collect

Every provider in this paper — OpenAI (ChatGPT), Anthropic (Claude), Google (Gemini), Perplexity, and DeepSeek — gathers three types of information.

This is the baseline. The key differences lie in how deeply each system reaches into the rest of your life.

Gemini: one assistant, many data streams

Gemini is built as a universal Google assistant. Google’s own privacy hub lists the services it can draw from: Gmail, Docs, Drive, Calendar, Maps, YouTube, Photos, and other Workspace apps, plus your device’s call and message logs when enabled on Android. Through “Connected apps”, Gemini can read across these services to answer questions and carry out actions.​

In July 2025, Google rolled out automatic access for Gemini to WhatsApp, Messages and Phone on many Android devices. Malwarebytes and others documented that this integration was applied by default unless users had previously gone into settings and turned specific connections off. For most people who had never touched those controls, Gemini effectively gained privileged access to their communications.​

DeepSeek: cloud in China, models everywhere

DeepSeek’s privacy policy states that when you use its cloud services — the chat website, app, or API — your personal data is stored on servers in the People’s Republic of China. That data is therefore governed by Chinese cybersecurity and national security laws, which can require companies to share information with state authorities.

At the same time, DeepSeek’s models are open‑source. Users can download them or access them via third‑party platforms and run them entirely outside China. In those deployments, prompts never reach DeepSeek’s Chinese infrastructure at all. This distinction matters: “DeepSeek” as a model does not automatically imply “your data goes to China”; DeepSeek as a hosted service does.​

Perplexity’s Comet browser: the aggressive edge

Perplexity’s Comet browser sits at the frontier of AI‑powered browsing. Its own Privacy Notice makes clear that Comet can collect:

Perplexity stresses that, for synced items like passwords and payment methods, data is stored locally or in secure vaults rather than on its servers, and that Incognito mode avoids storing Browsing Data at all. Even so, the design reveals the direction of travel: a browser that uses AI to “read” much of what passes through it in order to personalise responses.

How long your conversations are kept

Retention rules determine how long data can be used for training, safety review, subpoenas or enforcement. The durations differ not only between platforms but also within the same platform, depending on what you toggle.

Summary picture

PlatformDefault for normal chatsIf you turn training offFlagged or reviewed contentAfter deletion
ChatGPTStored until you delete them; no automatic expiry.Chats stay visible but are not used for training. Abuse‑monitoring logs are retained around 30 days.Not fully specified; legal holds can override deletion, as in NYT v OpenAI.​Deleted chats are purged from OpenAI systems within about 30 days.​
Claude30 days for most consumer use; up to 5 years in de‑identified form for training‑enabled users.30‑day retention continues; no training use.Inputs and outputs from safety‑flagged conversations up to 2 years; safety classifier scores (metadata) up to 7 years.Deleted chats disappear from the UI immediately and are removed from back‑end storage within 30 days.​
GeminiActivity auto‑deleted after 18 months by default; can be set to 3 or 36 months or “no auto‑delete”.​If you turn off “Keep activity”, most logs are deleted after 72 hours, but human‑reviewed snippets persist up to 3 years.​Human‑reviewed content and associated metadata stored up to 3 years, disconnected from your Google account.​Deleting activity does not affect already reviewed snippets.​
PerplexityStored “as long as reasonably necessary” for service and legal purposes; no fixed horizon.​You can turn off AI Data Retention to stop queries being used for improvement, but retention periods for basic logging remain vague.Not precisely specified.Personal data tied to an account is removed within roughly 30 days of account deletion.​
DeepSeekRetained “as long as necessary” for services and legal obligations; exact periods unclear.​Users can exercise rights (including objecting to training), but policy does not promise a specific reduced retention window.​Not specified.Account‑linked data is deleted, subject to regulatory requirements.​

Anthropic’s layered clocks

Anthropic’s approach is the most layered. For Claude consumer products as of late 2025:

The popular, but inaccurate, shorthand that “safety‑flagged content is stored for seven years” overstates what Anthropic actually keeps. It is the metadata, not the raw conversation, that has the seven‑year tail.

Gemini’s three‑year human review

Gemini’s privacy hub explains that chats may be sampled for human review to improve quality. Those samples are:

So a user can adopt strict settings and still have part of their history persist, in anonymised form, long after the visible logs are gone.

Training on your data

Model training is at the heart of the business model. The broad pattern in 2026 is consistent: consumer accounts default to training; enterprise and regulated accounts default to no training.

Consumer defaults and controls

PlatformDefault training status for typical individualsHow to turn it offWhat actually changes
ChatGPT“Improve the model for everyone” is on by default for Free, Plus and Pro users.Settings → Data controls → toggle off.​Future chats are not used to improve models. They still appear in your history and are stored until you delete them. Temporary Chats, toggled per conversation, are never used for training and are deleted after 30 days.
ClaudeThe “You can help improve Claude” toggle is presented default‑ON when the new policy appears, making it a de‑facto opt‑out.Settings → Privacy → turn off.​Your conversations stay under the 30‑day retention regime and are not added to training pipelines, though safety‑flagged content can still be held for two years.
GeminiGemini Apps Activity and training are enabled if “Keep activity” is on, which is the default for most users.​Turn off “Keep activity” in Gemini Apps Activity.​Most logs are deleted after around 72 hours, but previously human‑reviewed samples persist for up to three years.​
PerplexityAI Data Retention is enabled by default on Free, Pro and Max plans.​Toggle off AI Data Retention in settings.​Perplexity stops retaining your questions for model improvement; basic logging for security and operations continues, but the policy does not spell out durations.​
DeepSeekTraining on user content is allowed by default.​Email privacy@deepseek.com requesting that data not be used for training.​DeepSeek says it will stop using your data for model improvement, subject to technical and legal constraints.​

Enterprise and regulated accounts

On the enterprise side, the defaults reverse:

The effect is simple: if you are an individual using a free or low‑cost product, your data is part of the training pool unless you explicitly say otherwise. If you are an enterprise with a contract, you start in a no‑training zone and can opt in on your own terms.

Exporting your data: who lets you walk away with a copy?

Data export is where rhetoric about “user control” meets friction.

ChatGPT: good for individuals, thin for teams

For individual accounts (Free, Plus, Pro), OpenAI offers two export paths:

  1. In‑product: Settings → Data controls → Export data.​
  2. Privacy portal: a request via privacy.openai.com.​

Exports contain a structured JSON file (conversations.json) and an HTML file (chat.html) that replicates the chat interface for easy reading. Images and table outputs are stored alongside.​

For Team and Business workspace users, the story is worse. Multiple 2025–2026 forum threads report that there is no Export Data button in the UI; paying team customers are told to use copy‑and‑paste or negotiate access to an enterprise Compliance API. Several European users have called this a potential GDPR breach because they effectively have fewer portability tools than free‑tier users.

Claude: simple export, controlled by owners

Claude’s export is straightforward: Settings → Privacy → Export data sends a machine‑readable archive of your chats via email. For Claude for Work, the “primary owner” of the workspace can export organisational data, which includes member conversations. This central control is useful for corporate governance but means employees are dependent on administrators for full access to their own histories.​

Gemini: Takeout’s narrow path

Gemini uses Google Takeout. To export chats, you must:

If you instead select “Gemini” as a product, you mostly download configuration data for custom “Gems”, not your conversations. The archive itself is robust — HTML plus JSON activity logs — but the route to get there is unintuitive.​

Perplexity: no one‑click export

Perplexity offers:

Perplexity’s GDPR guidance explains that users seeking full copies of their data must either use a Data Privacy Form or email support; the company then has up to 30 days (plus a possible extension) to respond. The existence of numerous third‑party extensions to export threads to Markdown or other formats is itself evidence of this gap.

DeepSeek: export on web, not on mobile

DeepSeek’s web client includes Settings → Data → Export data, which triggers generation of a JSON archive of your chats and account data. The link is time‑limited, often seven days. The mobile app currently exposes only delete controls, not bulk export, forcing mobile‑first users onto the web if they want a copy.​​

5. Law, geography, and the politics of “who is the controller”

The same system behaves differently under different laws because each region recognises a “data controller”: the legal entity responsible for processing.

The EU: enforcing rights in practice

The EU’s GDPR gives residents enforceable rights to access, delete, port and object to processing. The EDPB’s 2024 opinion confirmed that “legitimate interest” can justify AI training only when companies can show a genuine interest, real necessity, and a fair balance with individuals’ rights.

Regulators have begun using their tools:

These actions show that large AI providers cannot simply gesture at global terms; they must adapt concretely to regional law.

The US: contractual privacy

Without a comprehensive federal privacy law, the US leans on state statutes (such as California’s CCPA/CPRA) and on contract terms. For US users, the strongest protections are usually contractual and tied to whether you interact as:

The law here primarily protects organisations that have the leverage to negotiate. Individuals mostly rely on company goodwill and public scrutiny.

China: sovereignty and reach

China’s PIPL and related laws give individuals rights to access and delete their data, but they coexist with cybersecurity and national security frameworks that mandate cooperation with the state in certain contexts. When DeepSeek stores data in China, that data is subject to these obligations. European regulators insist that GDPR still applies when EU residents’ data is involved; the enforcement reality across borders is much more complex.

6. Shadow AI and the two classes of users

A final dynamic cuts across all platforms: the emergence of “shadow AI”. Employees routinely paste work content into personal ChatGPT, Claude or Gemini accounts, even where corporate policy forbids it. Surveys across 2025 and 2026 show large shares of staff using free or personal AI tools for sensitive work tasks, and a meaningful fraction of data breaches now trace back to such use.​

At the same time, the contractual landscape splits users into two classes:

The irony is that the people best placed to understand and manage risk — legal teams, CISOs, data‑protection officers — typically sit in organisations that can buy proper enterprise plans. The wider public, especially outside Europe and North America, interacts with AI on consumer terms that offer far less agency.

7. What an ordinary user can do

Within this landscape, ordinary users still have meaningful options.

8. The new data deal

Across all five platforms, the practical “data deal” in 2026 looks like this:

The systems themselves are extraordinarily powerful. They are also, as currently built, engines for large‑scale data ingestion. Understanding how ChatGPT, Claude, Gemini, Perplexity and DeepSeek actually treat your conversations is no longer a technical curiosity. It is part of basic digital literacy, and a precondition for any serious public debate about where AI should fit in law, in markets, and in the texture of everyday life.

Exit mobile version