All posts
8 min readdata-retentionprivacyai-providerscomparison

AI Data Retention Policies Compared: OpenAI vs Anthropic vs Google vs GitHub (2026)

What happens to your code after you send it to ChatGPT, Claude, Gemini, or Copilot? We compare data retention, training opt-outs, and privacy policies across major AI providers.

When you send code to an AI coding assistant, what happens to it? Is it stored? For how long? Could it appear in another user's response? These are critical questions for any team handling sensitive code, and the answers vary significantly across providers.

The Quick Comparison

ProviderAPI RetentionWeb/App RetentionUsed for Training?DPA Available?
**OpenAI** (GPT-4, ChatGPT)30 days (abuse monitoring)Retained by defaultOpt-out available (API: no by default)Yes
**Anthropic** (Claude)Zero retention (API)90 days (web)No (API); feedback only (web)Yes
**Google** (Gemini)Up to 18 months (varies)RetainedDepends on planYes (Workspace)
**GitHub** (Copilot)Real-time onlyN/AOpt-out availableYes (Enterprise)
**Cursor**Uses upstream providerN/ANoVia upstream provider

Important caveat: These policies change frequently. Always check the provider's current terms before making compliance decisions.

OpenAI (ChatGPT, GPT-4, GPT-4o)

API Usage

  • Retention: 30 days for abuse and misuse monitoring
  • Training: Not used for training by default (since March 2023)
  • Deletion: Data automatically deleted after 30 days; you can request earlier deletion
  • DPA: Available for API customers

Web/ChatGPT Usage

  • Retention: Conversations stored indefinitely by default
  • Training: Used for model improvement unless you opt out
  • Opt-out: Settings → Data Controls → "Improve the model for everyone" → Off
  • Team/Enterprise: Separate terms, no training by default

What This Means for You

If your developers use the ChatGPT web interface, their conversations (including any code and secrets) are retained and potentially used for training unless they've manually opted out. The API is safer — 30-day retention, no training.

Anthropic (Claude, Claude Code)

API Usage

  • Retention: Zero retention by default. Prompts and responses are not stored after processing.
  • Training: API data is never used for training
  • DPA: Available for API customers
  • Safety evaluations: Anthropic may flag and review conversations that trigger safety classifiers, but does not store general API traffic

Web/Console Usage

  • Retention: Conversations retained for 90 days
  • Training: Not used for model training. User feedback (thumbs up/down) may be used.
  • Deletion: Users can delete conversations at any time

What This Means for You

Anthropic's API has the strongest privacy posture of the major providers. Zero retention means even if a developer accidentally sends a secret, it's not stored. Claude Code (the CLI tool) uses the API, so it gets zero-retention treatment.

Google (Gemini, Vertex AI)

Gemini API (Vertex AI)

  • Retention: Varies by configuration; customers can set retention periods
  • Training: Not used for training on Vertex AI
  • DPA: Available through Google Cloud / Workspace agreements

Gemini Web/App

  • Retention: Up to 18 months by default
  • Training: May be used to improve products (depends on plan and settings)
  • Opt-out: Activity controls in Google account settings

What This Means for You

Google's landscape is fragmented. Vertex AI (enterprise) has strong controls, but the consumer Gemini app has long retention and potential training usage. Make sure your developers use the enterprise API, not the consumer app.

GitHub Copilot

Copilot Individual

  • Retention: Code suggestions processed in real time, not stored long-term
  • Training: Opt-out available in settings; telemetry data may be collected
  • Context: Copilot sends the current file and open tabs to GitHub's servers

Copilot Business/Enterprise

  • Retention: No code is retained after suggestion generation
  • Training: No code used for training
  • DPA: Available for Business/Enterprise plans
  • IP indemnity: Available on Enterprise plans

What This Means for You

Copilot Business/Enterprise has the strongest guarantees. Individual plans have weaker protections. The risk with Copilot is scope — it automatically sends context from open files, so a developer with a .env file open in another tab might send it without realizing.

Cursor

Cursor routes requests through upstream AI providers (OpenAI, Anthropic). This means:

  • Data retention depends on which model you're using (GPT-4 → OpenAI's policy, Claude → Anthropic's policy)
  • Cursor itself states it doesn't store or train on code
  • The risk is the same as with any API proxy — whatever the upstream provider's policy is

What This Means for You

Configure Cursor to use Anthropic's Claude (zero retention) for the strongest privacy posture. Or route through a local scanning proxy like AxSentinel regardless of provider.

The Fundamental Problem

Even the best data retention policy doesn't protect against the core risk: the developer didn't know the secret was in the prompt.

  • OpenAI's 30-day retention means your AWS key sits in their systems for a month
  • Even Anthropic's zero retention can't un-send a database password
  • A credential doesn't need to be "retained" to be compromised — it was transmitted over the network

The Solution: Don't Send It in the First Place

Data retention policies are damage mitigation. Prevention means scanning every AI request before it leaves your machine:

  1. Regex scanning catches known patterns (AWS keys, SSNs, credit cards) in microseconds
  2. ML scanning catches unknown patterns (custom API keys, names in context, encoded secrets)
  3. Local processing means no data is sent to yet another third party for scanning
  4. Block mode stops the request entirely; redact mode strips the sensitive data and forwards the rest

This is what AxSentinel does. It works with every provider on this list — OpenAI, Anthropic, Google, GitHub Copilot, Cursor, and any other tool that uses HTTP API calls.

Protect your code regardless of provider policy →