By Al Kags
Deep Research using Gemini, Perplexity and Claude

Generative AI systems are no longer novelties. They sit in browsers at work, on mobile devices, in email clients and operating systems. Every time someone asks ChatGPT to draft a contract, or sends Claude a research memo, or lets Gemini read their messages, a quiet transaction takes place: human context in, machine capability out.
The questions this blog post asks are straightforward:
- What exactly do the major AI platforms collect?
- How long do they keep it, and do they feed it back into training?
- How easy is it to reclaim or export your own data?
- How do these practices change between the EU, the US and China?
The answers show a consistent pattern: generous data rights and strict limits for paying institutions, default‑on data collection for ordinary users, and growing gaps between legal theory and practical control.
What the big systems collect
Every provider in this paper — OpenAI (ChatGPT), Anthropic (Claude), Google (Gemini), Perplexity, and DeepSeek — gathers three types of information.
- Account and billing details – names, emails, phone numbers, payment information, and sometimes organisation identifiers.
- Content you submit and receive – prompts, chat histories, uploaded files, images, code, and, in some cases, audio, video, and screenshots.
- Technical and behavioural data – IP address, device type, browser, approximate location, cookies, crash logs and click patterns.
This is the baseline. The key differences lie in how deeply each system reaches into the rest of your life.
Gemini: one assistant, many data streams

Gemini is built as a universal Google assistant. Google’s own privacy hub lists the services it can draw from: Gmail, Docs, Drive, Calendar, Maps, YouTube, Photos, and other Workspace apps, plus your device’s call and message logs when enabled on Android. Through “Connected apps”, Gemini can read across these services to answer questions and carry out actions.
In July 2025, Google rolled out automatic access for Gemini to WhatsApp, Messages and Phone on many Android devices. Malwarebytes and others documented that this integration was applied by default unless users had previously gone into settings and turned specific connections off. For most people who had never touched those controls, Gemini effectively gained privileged access to their communications.
DeepSeek: cloud in China, models everywhere

DeepSeek’s privacy policy states that when you use its cloud services — the chat website, app, or API — your personal data is stored on servers in the People’s Republic of China. That data is therefore governed by Chinese cybersecurity and national security laws, which can require companies to share information with state authorities.
At the same time, DeepSeek’s models are open‑source. Users can download them or access them via third‑party platforms and run them entirely outside China. In those deployments, prompts never reach DeepSeek’s Chinese infrastructure at all. This distinction matters: “DeepSeek” as a model does not automatically imply “your data goes to China”; DeepSeek as a hosted service does.
Perplexity’s Comet browser: the aggressive edge

Perplexity’s Comet browser sits at the frontier of AI‑powered browsing. Its own Privacy Notice makes clear that Comet can collect:
- Browsing history, including URLs, page text, images, and downloads (“Browsing Data”).
- Technical data such as OS, hardware specs, memory and IP address.
- Additional data if you enable profile sync: saved passwords, security keys, payment methods, installed add‑ons, bookmarks, and precise geolocation, plus microphone and camera data where you grant access.
Perplexity stresses that, for synced items like passwords and payment methods, data is stored locally or in secure vaults rather than on its servers, and that Incognito mode avoids storing Browsing Data at all. Even so, the design reveals the direction of travel: a browser that uses AI to “read” much of what passes through it in order to personalise responses.
How long your conversations are kept
Retention rules determine how long data can be used for training, safety review, subpoenas or enforcement. The durations differ not only between platforms but also within the same platform, depending on what you toggle.
Summary picture
| Platform | Default for normal chats | If you turn training off | Flagged or reviewed content | After deletion |
| ChatGPT | Stored until you delete them; no automatic expiry. | Chats stay visible but are not used for training. Abuse‑monitoring logs are retained around 30 days. | Not fully specified; legal holds can override deletion, as in NYT v OpenAI. | Deleted chats are purged from OpenAI systems within about 30 days. |
| Claude | 30 days for most consumer use; up to 5 years in de‑identified form for training‑enabled users. | 30‑day retention continues; no training use. | Inputs and outputs from safety‑flagged conversations up to 2 years; safety classifier scores (metadata) up to 7 years. | Deleted chats disappear from the UI immediately and are removed from back‑end storage within 30 days. |
| Gemini | Activity auto‑deleted after 18 months by default; can be set to 3 or 36 months or “no auto‑delete”. | If you turn off “Keep activity”, most logs are deleted after 72 hours, but human‑reviewed snippets persist up to 3 years. | Human‑reviewed content and associated metadata stored up to 3 years, disconnected from your Google account. | Deleting activity does not affect already reviewed snippets. |
| Perplexity | Stored “as long as reasonably necessary” for service and legal purposes; no fixed horizon. | You can turn off AI Data Retention to stop queries being used for improvement, but retention periods for basic logging remain vague. | Not precisely specified. | Personal data tied to an account is removed within roughly 30 days of account deletion. |
| DeepSeek | Retained “as long as necessary” for services and legal obligations; exact periods unclear. | Users can exercise rights (including objecting to training), but policy does not promise a specific reduced retention window. | Not specified. | Account‑linked data is deleted, subject to regulatory requirements. |
Anthropic’s layered clocks

Anthropic’s approach is the most layered. For Claude consumer products as of late 2025:
- If you do not allow training, prompts and responses live in back‑end logs for up to 30 days.
- If you do allow training, Anthropic can retain de‑identified conversation data for up to five years in model training pipelines. This applies to new or resumed chats after the toggle is enabled; it is not retroactive.
- Safety‑flagged conversations are stored for up to two years, while the associated safety classifier scores can persist for seven years.
The popular, but inaccurate, shorthand that “safety‑flagged content is stored for seven years” overstates what Anthropic actually keeps. It is the metadata, not the raw conversation, that has the seven‑year tail.
Gemini’s three‑year human review
Gemini’s privacy hub explains that chats may be sampled for human review to improve quality. Those samples are:
- Stripped of direct account identifiers.
- Stored for up to three years, even if you delete your activity or turn off logging.
So a user can adopt strict settings and still have part of their history persist, in anonymised form, long after the visible logs are gone.
Training on your data
Model training is at the heart of the business model. The broad pattern in 2026 is consistent: consumer accounts default to training; enterprise and regulated accounts default to no training.
Consumer defaults and controls
| Platform | Default training status for typical individuals | How to turn it off | What actually changes |
| ChatGPT | “Improve the model for everyone” is on by default for Free, Plus and Pro users. | Settings → Data controls → toggle off. | Future chats are not used to improve models. They still appear in your history and are stored until you delete them. Temporary Chats, toggled per conversation, are never used for training and are deleted after 30 days. |
| Claude | The “You can help improve Claude” toggle is presented default‑ON when the new policy appears, making it a de‑facto opt‑out. | Settings → Privacy → turn off. | Your conversations stay under the 30‑day retention regime and are not added to training pipelines, though safety‑flagged content can still be held for two years. |
| Gemini | Gemini Apps Activity and training are enabled if “Keep activity” is on, which is the default for most users. | Turn off “Keep activity” in Gemini Apps Activity. | Most logs are deleted after around 72 hours, but previously human‑reviewed samples persist for up to three years. |
| Perplexity | AI Data Retention is enabled by default on Free, Pro and Max plans. | Toggle off AI Data Retention in settings. | Perplexity stops retaining your questions for model improvement; basic logging for security and operations continues, but the policy does not spell out durations. |
| DeepSeek | Training on user content is allowed by default. | Email [email protected] requesting that data not be used for training. | DeepSeek says it will stop using your data for model improvement, subject to technical and legal constraints. |
Enterprise and regulated accounts
On the enterprise side, the defaults reverse:
- OpenAI’s enterprise privacy documentation states that data from ChatGPT Enterprise, Business, Team, Edu, Healthcare, Teachers and API is not used for training unless a customer opts in.
- Anthropic forbids training on Claude for Work, Claude Gov, Education, and on data processed via Amazon Bedrock or Google Vertex AI.
- Google’s Gemini for Workspace promises that customer data stays within the organisation and is not used to improve models for others.
- Perplexity’s Data Processing Addendum for enterprise commits that customer data is excluded from training.
The effect is simple: if you are an individual using a free or low‑cost product, your data is part of the training pool unless you explicitly say otherwise. If you are an enterprise with a contract, you start in a no‑training zone and can opt in on your own terms.
Exporting your data: who lets you walk away with a copy?
Data export is where rhetoric about “user control” meets friction.
ChatGPT: good for individuals, thin for teams
For individual accounts (Free, Plus, Pro), OpenAI offers two export paths:
- In‑product: Settings → Data controls → Export data.
- Privacy portal: a request via privacy.openai.com.
Exports contain a structured JSON file (conversations.json) and an HTML file (chat.html) that replicates the chat interface for easy reading. Images and table outputs are stored alongside.
For Team and Business workspace users, the story is worse. Multiple 2025–2026 forum threads report that there is no Export Data button in the UI; paying team customers are told to use copy‑and‑paste or negotiate access to an enterprise Compliance API. Several European users have called this a potential GDPR breach because they effectively have fewer portability tools than free‑tier users.
Claude: simple export, controlled by owners
Claude’s export is straightforward: Settings → Privacy → Export data sends a machine‑readable archive of your chats via email. For Claude for Work, the “primary owner” of the workspace can export organisational data, which includes member conversations. This central control is useful for corporate governance but means employees are dependent on administrators for full access to their own histories.
Gemini: Takeout’s narrow path
Gemini uses Google Takeout. To export chats, you must:
- Go to takeout.google.com.
- Deselect all services.
- Under My Activity, pick only Gemini Apps.
If you instead select “Gemini” as a product, you mostly download configuration data for custom “Gems”, not your conversations. The archive itself is robust — HTML plus JSON activity logs — but the route to get there is unintuitive.
Perplexity: no one‑click export
Perplexity offers:
- Per‑answer exports (e.g. PDF of a generated report) and exports of created assets such as slides or HTML.
- No self‑service “download everything” button for conversation history.
Perplexity’s GDPR guidance explains that users seeking full copies of their data must either use a Data Privacy Form or email support; the company then has up to 30 days (plus a possible extension) to respond. The existence of numerous third‑party extensions to export threads to Markdown or other formats is itself evidence of this gap.
DeepSeek: export on web, not on mobile
DeepSeek’s web client includes Settings → Data → Export data, which triggers generation of a JSON archive of your chats and account data. The link is time‑limited, often seven days. The mobile app currently exposes only delete controls, not bulk export, forcing mobile‑first users onto the web if they want a copy.
5. Law, geography, and the politics of “who is the controller”
The same system behaves differently under different laws because each region recognises a “data controller”: the legal entity responsible for processing.
- For EEA and Swiss users, OpenAI Ireland Limited is the controller; for others, OpenAI OpCo, LLC in the US.
- For EEA/UK/Swiss users of Claude, Anthropic Ireland Limited plays this role; elsewhere, Anthropic PBC in the US.
- For Gemini users in Europe, Google Ireland Limited is the provider, while elsewhere Google LLC is.
- DeepSeek’s controller is a Chinese company, with an EU representative but no EU subsidiary.
- Perplexity operates primarily from the US, with a UK‑based data protection officer and Data Privacy Framework certifications.
The EU: enforcing rights in practice
The EU’s GDPR gives residents enforceable rights to access, delete, port and object to processing. The EDPB’s 2024 opinion confirmed that “legitimate interest” can justify AI training only when companies can show a genuine interest, real necessity, and a fair balance with individuals’ rights.
Regulators have begun using their tools:
- In December 2024, Italy’s Garante fined OpenAI €15 million, citing unlawful data processing, poor breach notification, and child‑protection failures.
- In January 2025, the same authority issued an emergency limitation on DeepSeek’s processing of Italians’ data after DeepSeek claimed EU rules did not apply. This effectively banned the service from Italy and prompted further scrutiny from other EU regulators.
These actions show that large AI providers cannot simply gesture at global terms; they must adapt concretely to regional law.
The US: contractual privacy
Without a comprehensive federal privacy law, the US leans on state statutes (such as California’s CCPA/CPRA) and on contract terms. For US users, the strongest protections are usually contractual and tied to whether you interact as:
- a consumer (click‑through terms, default‑on training, long retention), or
- a customer (enterprise or public‑sector contract, no training, shorter logs, audit rights).
The law here primarily protects organisations that have the leverage to negotiate. Individuals mostly rely on company goodwill and public scrutiny.
China: sovereignty and reach
China’s PIPL and related laws give individuals rights to access and delete their data, but they coexist with cybersecurity and national security frameworks that mandate cooperation with the state in certain contexts. When DeepSeek stores data in China, that data is subject to these obligations. European regulators insist that GDPR still applies when EU residents’ data is involved; the enforcement reality across borders is much more complex.
6. Shadow AI and the two classes of users
A final dynamic cuts across all platforms: the emergence of “shadow AI”. Employees routinely paste work content into personal ChatGPT, Claude or Gemini accounts, even where corporate policy forbids it. Surveys across 2025 and 2026 show large shares of staff using free or personal AI tools for sensitive work tasks, and a meaningful fraction of data breaches now trace back to such use.
At the same time, the contractual landscape splits users into two classes:
- Subjects – individual users whose data helps train models by default, who face long retention and limited export.
- Clients – enterprises and institutions whose data sits behind strong contractual walls, who get no‑training guarantees, short logs, and often custom export capabilities.
The irony is that the people best placed to understand and manage risk — legal teams, CISOs, data‑protection officers — typically sit in organisations that can buy proper enterprise plans. The wider public, especially outside Europe and North America, interacts with AI on consumer terms that offer far less agency.
7. What an ordinary user can do
Within this landscape, ordinary users still have meaningful options.
- Limit what you feed the system. Treat every prompt as if it may be retained and, in some form, reused. Avoid entering secrets, trade strategies, or uniquely identifying personal information.
- Change the defaults. Turn off ChatGPT’s “Improve the model for everyone” and use Temporary Chats. Adjust Gemini’s activity auto‑delete and consider turning activity off completely. Disable AI Data Retention in Perplexity if you are not comfortable with training.
- Export regularly wherever possible. Use ChatGPT’s export on personal accounts, Claude’s export, and Google Takeout for Gemini. For Perplexity and DeepSeek, be prepared to go through formal request channels when you need a full archive.
- Treat work data as institutional. If you are using AI for organisational work, push for proper enterprise accounts with no‑training guarantees. Failing that, treat shadow AI as a risk and minimise what you share.
8. The new data deal
Across all five platforms, the practical “data deal” in 2026 looks like this:
- For individuals, privacy is something you must actively negotiate with settings, export requests and deletion habits.
- For enterprises and governments, privacy is something you buy — in the form of dedicated contracts, audit rights and zero‑training commitments.
- Regulators in the EU are starting to press back, but enforcement is uneven elsewhere, and cross‑border control over actors like DeepSeek is still experimental.
The systems themselves are extraordinarily powerful. They are also, as currently built, engines for large‑scale data ingestion. Understanding how ChatGPT, Claude, Gemini, Perplexity and DeepSeek actually treat your conversations is no longer a technical curiosity. It is part of basic digital literacy, and a precondition for any serious public debate about where AI should fit in law, in markets, and in the texture of everyday life.