ChatGPT Export Studio - Usage Guide

Quick Start

Want the easiest path? Double-click start_export_studio_gui.bat (Windows) or start_export_studio_gui.sh (macOS/Linux) to auto-create a virtualenv, install dependencies, and launch the GUI. Then jump to Explore Your Data below.

1. Getting Your ChatGPT Export

Go to ChatGPT Settings → Data Controls → Export Data
Wait for the email with your export ZIP
Download the ZIP file (typically named chat-data-export-yyyy-mm-dd.zip)

2. Import Your Export

python3 export_studio.py import /path/to/your-export.zip

This will:

Extract the ZIP
Find conversations.json
Parse and normalize all conversations
Store in SQLite database with FTS5 index
Extract metadata (intent, topics, flags)

3. Explore Your Data

List Conversations

python3 export_studio.py list --limit 20

Search Messages

# Search for specific topics
python3 export_studio.py search "machine learning" --limit 10

# Search for code snippets
python3 export_studio.py search "python def" --limit 5

Launch GUI

python3 export_studio.py gui

The GUI provides:

Browse all conversations
Search with instant results
View messages in chronological order
Export datasets with one click

Model Foundry

Chunking for RAG

Before using chunks for semantic search or embeddings:

python3 export_studio.py chunk

This creates overlapping chunks of 800-1200 tokens (estimated) suitable for:

Retrieval Augmented Generation (RAG)
Embedding generation
Context-aware search

Export Clean Corpus

python3 export_studio.py export corpus ./my_corpus

Creates:

corpus.jsonl: Structured records with role, intent, topics, timestamps
corpus.txt: Plain text format with separators
manifest.json: Export metadata and checksums

Use cases:

Training language models
Fine-tuning on your writing style
Analysis and statistics

Export SSR Dataset

python3 export_studio.py export ssr ./my_ssr

Creates:

ssr.jsonl: Full Structured Semantic Records with all metadata
Schema version tracked for reproducibility

SSR includes:

Stable IDs
Parent-child relationships
Intent classification
Topic extraction
Content hashes
Temporal information

Export Training Pairs

python3 export_studio.py export pairs ./my_pairs

Creates:

pairs.jsonl: Question-answer pairs mined from conversations

Format:

{
  "id": "pair_xxx_yyy",
  "a": "user question",
  "b": "assistant answer",
  "label": 1,
  "type": "qa",
  "meta": {"conversation_id": "...", "intent": "question"}
}

Use cases:

Supervised fine-tuning
Question-answering models
Instruction following

Export Contrastive Triples

python3 export_studio.py export triples ./my_triples

Creates:

triples.jsonl: Anchor, positive, negative triplets

Format:

{
  "anchor": "user message",
  "positive": "correct assistant response",
  "negative": "unrelated response from different conversation",
  "meta": {"anchor_id": "...", "pos_id": "..."}
}

Use cases:

Contrastive learning
Embedding model training
Semantic similarity models

Advanced Usage

Database Location

Specify custom database:

python3 export_studio.py --db /path/to/my_database.db list

Default: export_studio.db in current directory

Re-importing

Force re-import (if you've edited the export):

python3 export_studio.py import my-export.zip --force

Without --force, duplicate exports (same hash) are skipped.

Metadata Extraction

Metadata is extracted automatically using deterministic heuristics:

Intent Detection:

question: Contains "?" or starts with interrogatives (what, why, how)
instruction: Starts with imperative verbs (build, create, make)
explanation: Contains because/therefore/means
plan: Contains plan/roadmap/milestone keywords
other: Default fallback

Flags:

is_question: Question marks or interrogative starters
is_code: Code fences (```) or high keyword density
is_list: Multiple lines starting with -, *, or numbers
has_steps: Numbered steps or "Step N" patterns

Topics:

Top 10 keywords after removing stopwords
Deterministic, reproducible

PII Redaction

The PIIRedactor class automatically detects and redacts:

Email addresses → [REDACTED_EMAIL_N]
Phone numbers → [REDACTED_PHONE]
SSN patterns → [REDACTED_SSN]

Future enhancement: Add --redact flag to export commands.

Building Windows Executable

Prerequisites

pip install pyinstaller

Build

pyinstaller export_studio.spec

Output: dist/ExportStudio.exe (single-file executable)

Run

ExportStudio.exe gui
ExportStudio.exe import my-export.zip
ExportStudio.exe list

Tips & Best Practices

1. Organize with Projects (Future)

The database schema supports projects for organizing conversations:

Group related conversations
Track exports per project
Version control your datasets

2. Chunking Strategy

Default: 800-1200 tokens, 15% overlap

Good for:

Embedding generation (512-1024 token models)
RAG retrieval
Context windows

3. Export Workflow

Recommended workflow:

Import export ZIP
Review conversations in GUI
Chunk conversations
Export corpus for analysis
Export pairs/triples for training
Export SSR for archival

4. Search Tips

FTS5 supports:

Phrase search: "exact phrase"
Boolean: python AND machine learning
Prefix: embed* matches embed, embeddings, embedded

5. Reproducibility

Every export includes:

Input hash (source data)
Config hash (parameters)
Output hash (generated data)
Timestamps
Record counts

This ensures:

Reproducible pipelines
Traceable artifacts
Auditable datasets

Troubleshooting

Import fails with "conversations.json not found"

Ensure your ZIP contains conversations.json at the root or in a subdirectory.

Search returns no results

Check:

Was the import successful?
Are messages in the database? python3 export_studio.py list
Try simpler queries first

GUI doesn't launch

Ensure Tkinter is installed:

Ubuntu: sudo apt-get install python3-tk
macOS: Included with Python
Windows: Included with Python

Database is locked

Close other connections to the database. Only one write connection at a time.

Performance

Import: ~1000 conversations/second
Search: FTS5 is fast, even with millions of messages
Chunking: ~500 conversations/second
Export: Limited by disk I/O

For very large exports (>100k conversations):

Consider batching exports
Use SSD storage
Increase system memory

Next Steps

✅ Import your export
✅ Explore with GUI or CLI
✅ Chunk for RAG
✅ Export datasets
🚧 Add local embeddings (future enhancement)
🚧 Implement hybrid search (future enhancement)
🚧 Train custom models on your data

Support

Documentation: README.md
Issues: GitHub Issues
Source: GitHub Repository

Privacy First: All processing happens locally. No data leaves your machine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatGPT Export Studio - Usage Guide

Quick Start

1. Getting Your ChatGPT Export

2. Import Your Export

3. Explore Your Data

List Conversations

Search Messages

Launch GUI

Model Foundry

Chunking for RAG

Export Clean Corpus

Export SSR Dataset

Export Training Pairs

Export Contrastive Triples

Advanced Usage

Database Location

Re-importing

Metadata Extraction

PII Redaction

Building Windows Executable

Prerequisites

Build

Run

Tips & Best Practices

1. Organize with Projects (Future)

2. Chunking Strategy

3. Export Workflow

4. Search Tips

5. Reproducibility

Troubleshooting

Import fails with "conversations.json not found"

Search returns no results

GUI doesn't launch

Database is locked

Performance

Next Steps

Support

FilesExpand file tree

USAGE.md

Latest commit

History

USAGE.md

File metadata and controls

ChatGPT Export Studio - Usage Guide

Quick Start

1. Getting Your ChatGPT Export

2. Import Your Export

3. Explore Your Data

List Conversations

Search Messages

Launch GUI

Model Foundry

Chunking for RAG

Export Clean Corpus

Export SSR Dataset

Export Training Pairs

Export Contrastive Triples

Advanced Usage

Database Location

Re-importing

Metadata Extraction

PII Redaction

Building Windows Executable

Prerequisites

Build

Run

Tips & Best Practices

1. Organize with Projects (Future)

2. Chunking Strategy

3. Export Workflow

4. Search Tips

5. Reproducibility

Troubleshooting

Import fails with "conversations.json not found"

Search returns no results

GUI doesn't launch

Database is locked

Performance

Next Steps

Support